English for Computer Science · the textile loom that used a series of punched paper cards as a...
Transcript of English for Computer Science · the textile loom that used a series of punched paper cards as a...
English for Computer
Students
Edited By Sassan Azad
Islamic Azad University of Roudehen
Fall 2009
2
Computer....................................................................................................................... 3
Operating system................................................................................................... 29
Computer networking ......................................................................................... 45
Internet........................................................................................................................ 51
Programming language ...................................................................................... 69
Windows XP............................................................................................................... 84
Windows Vista ......................................................................................................... 95
Algorithm .................................................................................................................. 117
Compiler .................................................................................................................... 137
Network topology ................................................................................................ 147
3
Computer
The NASA Columbia Supercomputer.
A computer in a wristwatch.
A computer is a machine which manipulates data according to a list of instructions.
Computers take numerous physical forms. The first devices that resemble modern
computers date to the mid-20th century (around 1940 - 1941), although the
computer concept and various machines similar to computers existed earlier. Early
electronic computers were the size of a large room, consuming as much power as
several hundred modern personal computers. Modern computers are based on
comparatively tiny integrated circuits and are millions to billions of times more
capable while occupying a fraction of the space. Today, simple computers may be
made small enough to fit into a wrist watch and be powered from a watch battery.
Personal computers in various forms are icons of the information age and are what
4
most people think of as "a computer". However, the most common form of computer
in use today is by far the embedded computer. Embedded computers are small,
simple devices that are often used to control other devices—for example, they may
be found in machines ranging from fighter aircraft to industrial robots, digital
cameras, and even children's toys.
The ability to store and execute lists of instructions called programs makes
computers extremely versatile and distinguishes them from calculators. The Church–
Turing thesis is a mathematical statement of this versatility: Any computer with a
certain minimum capability is, in principle, capable of performing the same tasks that
any other computer can perform. Therefore, computers with capability and
complexity ranging from that of a personal digital assistant to a supercomputer are
all able to perform the same computational tasks given enough time and storage
capacity.
History of computing
The Jacquard loom was one of the first programmable devices.
It is difficult to identify any one device as the earliest computer, partly because the
term "computer" has been subject to varying interpretations over time.
Originally, the term "computer" referred to a person who performed numerical
calculations (a human computer), often with the aid of a mechanical calculating
device. Examples of early mechanical computing devices included the abacus, the
5
slide rule and arguably the astrolabe and the Antikythera mechanism (which dates
from about 150-100 BC). The end of the Middle Ages saw a re-invigoration of
European mathematics and engineering, and Wilhelm Schickard's 1623 device was
the first of a number of mechanical calculators constructed by European engineers.
However, none of those devices fit the modern definition of a computer because they
could not be programmed. In 1801, Joseph Marie Jacquard made an improvement to
the textile loom that used a series of punched paper cards as a template to allow his
loom to weave intricate patterns automatically. The resulting Jacquard loom was an
important step in the development of computers because the use of punched cards
to define woven patterns can be viewed as an early, albeit limited, form of
programmability.
In 1837, Charles Babbage was the first to conceptualize and design a fully
programmable mechanical computer that he called "The Analytical Engine".. Due to
limited finance, and an inability to resist tinkering with the design, Babbage never
actually built his Analytical Engine.
Large-scale automated data processing of punched cards was performed for the U.S.
Census in 1890 by tabulating machines designed by Herman Hollerith and
manufactured by the Computing Tabulating Recording Corporation, which later
became IBM. By the end of the 19th century a number of technologies that would
later prove useful in the realization of practical computers had begun to appear: the
punched card, Boolean algebra, the vacuum tube (thermionic valve) and the
teleprinter.
During the first half of the 20th century, many scientific computing needs were met
by increasingly sophisticated analog computers, which used a direct mechanical or
electrical model of the problem as a basis for computation. However, these were not
programmable and generally lacked the versatility and accuracy of modern digital
computers.
6
Defining characteristics of five early digital computers
Computer First
operation Place Decimal/Binary Electronic Programmable
Turing complete
Zuse Z3 May 1941 Germany binary No By punched film stock
Yes (1998)
Atanasoff–Berry Computer
Summer 1941
USA binary Yes No No
Colossus
December 1943 / January 1944
UK binary Yes Partially, by rewiring
No
Harvard Mark I – IBM ASCC
1944 USA decimal No By punched paper tape
Yes (1998)
1944 USA decimal Yes Partially, by rewiring
Yes
ENIAC
1948 USA decimal Yes By Function Table ROM
Yes
A succession of steadily more powerful and flexible computing devices were
constructed in the 1930s and 1940s, gradually adding the key features that are seen
in modern computers. The use of digital electronics (largely invented by Claude
Shannon in 1937) and more flexible programmability were vitally important steps,
but defining one point along this road as "the first digital electronic computer" is
difficult (Shannon 1940). Notable achievements include:
7
Konrad Zuse's electromechanical "Z machines". The Z3 (1941) was the first
working machine featuring binary arithmetic, including floating point
arithmetic and a measure of programmability. In 1998 the Z3 was proved to
be Turing complete, therefore being the world's first operational computer.
The non-programmable Atanasoff–Berry Computer (1941) which used
vacuum tube based computation, binary numbers, and regenerative capacitor
memory.
The secret British Colossus computer (1944), which had limited
programmability but demonstrated that a device using thousands of tubes
could be reasonably reliable and electronically reprogrammable. It was used
for breaking German wartime codes.
The Harvard Mark I (1944), a large-scale electromechanical computer with
limited programmability.
The U.S. Army's Ballistics Research Laboratory ENIAC (1946), which used
decimal arithmetic and is sometimes called the first general purpose
electronic computer (since Konrad Zuse's Z3 of 1941 used electromagnets
instead of electronics). Initially, however, ENIAC had an inflexible architecture
which essentially required rewiring to change its programming.
Several developers of ENIAC, recognizing its flaws, came up with a far more flexible
and elegant design, which came to be known as the stored program architecture
or von Neumann architecture. This design was first formally described by John von
Neumann in the paper "First Draft of a Report on the EDVAC", published in 1945. A
number of projects to develop computers based on the stored program architecture
commenced around this time, the first of these being completed in Great Britain. The
first to be demonstrated working was the Manchester Small-Scale Experimental
Machine (SSEM) or "Baby". However, the EDSAC, completed a year after SSEM, was
perhaps the first practical implementation of the stored program design. Shortly
thereafter, the machine originally described by von Neumann's paper—EDVAC—was
completed but did not see full-time use for an additional two years.
Nearly all modern computers implement some form of the stored program
architecture, making it the single trait by which the word "computer" is now defined.
By this standard, many earlier devices would no longer be called computers by
today's definition, but are usually referred to as such in their historical context. While
the technologies used in computers have changed dramatically since the first
8
electronic, general-purpose computers of the 1940s, most still use the von Neumann
architecture. The design made the universal computer a practical reality.
Microprocessors are miniaturized devices that often implement stored program CPUs.
Vacuum tube-based computers were in use throughout the 1950s, but were largely
replaced in the 1960s by transistor-based devices, which were smaller, faster,
cheaper, used less power and were more reliable. These factors allowed computers
to be produced on an unprecedented commercial scale. By the 1970s, the adoption
of integrated circuit technology and the subsequent creation of microprocessors such
as the Intel 4004 caused another leap in size, speed, cost and reliability. By the
1980s, computers had become sufficiently small and cheap to replace simple
mechanical controls in domestic appliances such as washing machines. Around the
same time, computers became widely accessible for personal use by individuals in
the form of home computers and the now ubiquitous personal computer. In
conjunction with the widespread growth of the Internet since the 1990s, personal
computers are becoming as common as the television and the telephone and almost
all modern electronic devices contain a computer of some kind.
Stored program architecture
The defining feature of modern computers which distinguishes them from all other
machines is that they can be programmed. That is to say that a list of instructions
(the program) can be given to the computer and it will store them and carry them
out at some time in the future.
9
In most cases, computer instructions are simple: add one number to another, move
some data from one location to another, send a message to some external device,
etc. These instructions are read from the computer's memory and are generally
carried out (executed) in the order they were given. However, there are usually
specialized instructions to tell the computer to jump ahead or backwards to some
other place in the program and to carry on executing from there. These are called
"jump" instructions (or branches). Furthermore, jump instructions may be made to
happen conditionally so that different sequences of instructions may be used
depending on the result of some previous calculation or some external event. Many
computers directly support subroutines by providing a type of jump that
"remembers" the location it jumped from and another instruction to return to the
instruction following that jump instruction.
Program execution might be likened to reading a book. While a person will normally
read each word and line in sequence, they may at times jump back to an earlier
place in the text or skip sections that are not of interest. Similarly, a computer may
sometimes go back and repeat the instructions in some section of the program over
and over again until some internal condition is met. This is called the flow of control
within the program and it is what allows the computer to perform tasks repeatedly
without human intervention.
Comparatively, a person using a pocket calculator can perform a basic arithmetic
operation such as adding two numbers with just a few button presses. But to add
together all of the numbers from 1 to 1,000 would take thousands of button presses
and a lot of time—with a near certainty of making a mistake. On the other hand, a
computer may be programmed to do this with just a few simple instructions. For
example:
mov #0,sum ; set sum to 0
mov #1,num ; set num to 1
loop: add num,sum ; add num to sum
add #1,num ; add 1 to num
cmp num,#1000 ; compare num to 1000
ble loop ; if num <= 1000, go back to 'loop'
halt ; end of program. stop running
10
Once told to run this program, the computer will perform the repetitive addition task
without further human intervention. It will almost never make a mistake and a
modern PC can complete the task in about a millionth of a second.
However, computers cannot "think" for themselves in the sense that they only solve
problems in exactly the way they are programmed to. An intelligent human faced
with the above addition task might soon realize that instead of actually adding up all
the numbers, one can simply use the equation and arrive at the correct answer
(500,500) with little work.. In other words, a computer programmed to add up the
numbers one by one as in the example above would do exactly that without regard
to efficiency or alternative solutions.
Programs
A 1970s punched card containing one line from a FORTRAN program. The card reads:
"Z(1) = Y + W(1)" and is labelled "PROJ039" for identification purposes.
In practical terms, a computer program might include anywhere from a dozen
instructions to many millions of instructions for something like a word processor or a
web browser. A typical modern computer can execute billions of instructions every
second and nearly never make a mistake over years of operation.
Large computer programs may take teams of computer programmers years to write
and the probability of the entire program having been written completely in the
manner intended is unlikely. Errors in computer programs are called bugs.
Sometimes bugs are benign and do not affect the usefulness of the program, in other
cases they might cause the program to completely fail (crash), in yet other cases
there may be subtle problems. Sometimes otherwise benign bugs may be used for
malicious intent, creating a security exploit. Bugs are usually not the fault of the
11
computer. Since computers merely execute the instructions they are given, bugs are
nearly always the result of programmer error or an oversight made in the program's
design.
In most computers, individual instructions are stored as machine code with each
instruction being given a unique number (its operation code or opcode for short).
The command to add two numbers together would have one opcode, the command
to multiply them would have a different opcode and so on. The simplest computers
are able to perform any of a handful of different instructions, the more complex
computers have several hundred to choose from—each with a unique numerical
code. Since the computer's memory is able to store numbers, it can also store the
instruction codes. This leads to the important fact that entire programs (which are
just lists of instructions) can be represented as lists of numbers and can themselves
be manipulated inside the computer just as if they were numeric data. The
fundamental concept of storing programs in the computer's memory alongside the
data they operate on is the crux of the von Neumann, or stored program,
architecture. In some cases, a computer might store some or all of its program in
memory that is kept separate from the data it operates on. This is called the Harvard
architecture after the Harvard Mark I computer. Modern von Neumann computers
display some traits of the Harvard architecture in their designs, such as in CPU
caches.
While it is possible to write computer programs as long lists of numbers (machine
language) and this technique was used with many early computers,. it is extremely
tedious to do so in practice, especially for complicated programs. Instead, each basic
instruction can be given a short name that is indicative of its function and easy to
remember—a mnemonic such as ADD, SUB, MULT or JUMP. These mnemonics are
collectively known as a computer's assembly language. Converting programs written
in assembly language into something the computer can actually understand
(machine language) is usually done by a computer program called an assembler.
Machine languages and the assembly languages that represent them (collectively
termed low-level programming languages) tend to be unique to a particular type of
computer. For instance, an ARM architecture computer (such as may be found in a
PDA or a hand-held videogame) cannot understand the machine language of an Intel
Pentium or the AMD Athlon 64 computer that might be in a PC.
12
Though considerably easier than in machine language, writing long programs in
assembly language is often difficult and error prone. Therefore, most complicated
programs are written in more abstract high-level programming languages that are
able to express the needs of the computer programmer more conveniently (and
thereby help reduce programmer error). High level languages are usually "compiled"
into machine language (or sometimes into assembly language and then into machine
language) using another computer program called a compiler.. Since high level
languages are more abstract than assembly language, it is possible to use different
compilers to translate the same high level language program into the machine
language of many different types of computer. This is part of the means by which
software like video games may be made available for different computer
architectures such as personal computers and various video game consoles.
The task of developing large software systems is an immense intellectual effort. It
has proven, historically, to be very difficult to produce software with an acceptably
high reliability, on a predictable schedule and budget. The academic and professional
discipline of software engineering concentrates specifically on this problem.
Example
A traffic light showing red.
Suppose a computer is being employed to drive a traffic light. A simple stored
program might say:
1. Turn off all of the lights
2. Turn on the red light
3. Wait for sixty seconds
4. Turn off the red light
5. Turn on the green light
13
6. Wait for sixty seconds
7. Turn off the green light
8. Turn on the yellow light
9. Wait for two seconds
10. Turn off the yellow light
11. Jump to instruction number (2)
With this set of instructions, the computer would cycle the light continually through
red, green, yellow and back to red again until told to stop running the program.
However, suppose there is a simple on/off switch connected to the computer that is
intended be used to make the light flash red while some maintenance operation is
being performed. The program might then instruct the computer to:
1. Turn off all of the lights
2. Turn on the red light
3. Wait for sixty seconds
4. Turn off the red light
5. Turn on the green light
6. Wait for sixty seconds
7. Turn off the green light
8. Turn on the yellow light
9. Wait for two seconds
10. Turn off the yellow light
11. If the maintenance switch is NOT turned on then jump to instruction number2
12. Turn on the red light
13. Wait for one second
14. Turn off the red light
15. Wait for one second
16. Jump to instruction number 11
In this manner, the computer is either running the instructions from number (2) to
(11) over and over or its running the instructions from (11) down to (16) over and
over, depending on the position of the switch.
14
How computers work
A general purpose computer has four main sections: the arithmetic and logic unit
(ALU), the control unit, the memory, and the input and output devices (collectively
termed I/O). These parts are interconnected by busses, often made of groups of
wires.
The control unit, ALU, registers, and basic I/O (and often other hardware closely
linked with these) are collectively known as a central processing unit (CPU). Early
CPUs were composed of many separate components but since the mid-1970s CPUs
have typically been constructed on a single integrated circuit called a
microprocessor.
Control unit
The control unit (often called a control system or central controller) directs the
various components of a computer. It reads and interprets (decodes) instructions in
the program one by one. The control system decodes each instruction and turns it
into a series of control signals that operate the other parts of the computer.. Control
systems in advanced computers may change the order of some instructions so as to
improve performance.
A key component common to all CPUs is the program counter, a special memory cell
(a register) that keeps track of which location in memory the next instruction is to be
read from.
Diagram showing how a particular MIPS architecture instruction would be decoded by the control system.
The control system's function is as follows—note that this is a simplified description
and some of these steps may be performed concurrently or in a different order
depending on the type of CPU:
15
1. Read the code for the next instruction from the cell indicated by the program
counter.
2. Decode the numerical code for the instruction into a set of commands or
signals for each of the other systems.
3. Increment the program counter so it points to the next instruction.
4. Read whatever data the instruction requires from cells in memory (or perhaps
from an input device). The location of this required data is typically stored
within the instruction code.
5. Provide the necessary data to an ALU or register.
6. If the instruction requires an ALU or specialized hardware to complete,
instruct the hardware to perform the requested operation.
7. Write the result from the ALU back to a memory location or to a register or
perhaps an output device.
8. Jump back to step (1).
Since the program counter is (conceptually) just another set of memory cells, it can
be changed by calculations done in the ALU. Adding 100 to the program counter
would cause the next instruction to be read from a place 100 locations further down
the program. Instructions that modify the program counter are often known as
"jumps" and allow for loops (instructions that are repeated by the computer) and
often conditional instruction execution (both examples of control flow).
It is noticeable that the sequence of operations that the control unit goes through to
process an instruction is in itself like a short computer program - and indeed, in
some more complex CPU designs, there is another yet smaller computer called a
microsequencer that runs a microcode program that causes all of these events to
happen.
16
Arithmetic/logic unit (ALU)
The ALU is capable of performing two classes of operations: arithmetic and logic.
The set of arithmetic operations that a particular ALU supports may be limited to
adding and subtracting or might include multiplying or dividing, trigonometry
functions (sine, cosine, etc) and square roots. Some can only operate on whole
numbers (integers) whilst others use floating point to represent real numbers—albeit
with limited precision. However, any computer that is capable of performing just the
simplest operations can be programmed to break down the more complex operations
into simple steps that it can perform. Therefore, any computer can be programmed
to perform any arithmetic operation—although it will take more time to do so if its
ALU does not directly support the operation. An ALU may also compare numbers and
return boolean truth values (true or false) depending on whether one is equal to,
greater than or less than the other ("is 64 greater than 65?").
Logic operations involve Boolean logic: AND, OR, XOR and NOT. These can be useful
both for creating complicated conditional statements and processing boolean logic.
Superscalar computers contain multiple ALUs so that they can process several
instructions at the same time. Graphics processors and computers with SIMD and
MIMD features often provide ALUs that can perform arithmetic on vectors and
matrices.
Memory
Magnetic core memory was popular main memory for computers through the 1960s
until it was completely replaced by semiconductor memory.
17
A computer's memory can be viewed as a list of cells into which numbers can be
placed or read. Each cell has a numbered "address" and can store a single number.
The computer can be instructed to "put the number 123 into the cell numbered
1357" or to "add the number that is in cell 1357 to the number that is in cell 2468
and put the answer into cell 1595". The information stored in memory may represent
practically anything. Letters, numbers, even computer instructions can be placed into
memory with equal ease. Since the CPU does not differentiate between different
types of information, it is up to the software to give significance to what the memory
sees as nothing but a series of numbers.
In almost all modern computers, each memory cell is set up to store binary numbers
in groups of eight bits (called a byte). Each byte is able to represent 256 different
numbers; either from 0 to 255 or -128 to +127. To store larger numbers, several
consecutive bytes may be used (typically, two, four or eight). When negative
numbers are required, they are usually stored in two's complement notation. Other
arrangements are possible, but are usually not seen outside of specialized
applications or historical contexts. A computer can store any kind of information in
memory as long as it can be somehow represented in numerical form. Modern
computers have billions or even trillions of bytes of memory.
The CPU contains a special set of memory cells called registers that can be read and
written to much more rapidly than the main memory area. There are typically
between two and one hundred registers depending on the type of CPU. Registers are
used for the most frequently needed data items to avoid having to access main
memory every time data is needed. Since data is constantly being worked on,
reducing the need to access main memory (which is often slow compared to the ALU
and control units) greatly increases the computer's speed.
Computer main memory comes in two principal varieties: random access memory or
RAM and read-only memory or ROM. RAM can be read and written to anytime the
CPU commands it, but ROM is pre-loaded with data and software that never changes,
so the CPU can only read from it. ROM is typically used to store the computer's initial
start-up instructions. In general, the contents of RAM is erased when the power to
the computer is turned off while ROM retains its data indefinitely. In a PC, the ROM
contains a specialized program called the BIOS that orchestrates loading the
computer's operating system from the hard disk drive into RAM whenever the
18
computer is turned on or reset. In embedded computers, which frequently do not
have disk drives, all of the software required to perform the task may be stored in
ROM. Software that is stored in ROM is often called firmware because it is notionally
more like hardware than software. Flash memory blurs the distinction between ROM
and RAM by retaining data when turned off but being rewritable like RAM. However,
flash memory is typically much slower than conventional ROM and RAM so its use is
restricted to applications where high speeds are not required.
In more sophisticated computers there may be one or more RAM cache memories
which are slower than registers but faster than main memory. Generally computers
with this sort of cache are designed to move frequently needed data into the cache
automatically, often without the need for any intervention on the programmer's part.
Input/output (I/O)
Hard disks are common I/O devices used with computers.
I/O is the means by which a computer receives information from the outside world
and sends results back. Devices that provide input or output to the computer are
called peripherals. On a typical personal computer, peripherals include input devices
like the keyboard and mouse, and output devices such as the display and printer.
Hard disk drives, floppy disk drives and optical disc drives serve as both input and
output devices. Computer networking is another form of I/O.
Often, I/O devices are complex computers in their own right with their own CPU and
memory. A graphics processing unit might contain fifty or more tiny computers that
perform the calculations necessary to display 3D graphics. Modern desktop
computers contain many smaller computers that assist the main CPU in performing
I/O.
19
Multitasking
While a computer may be viewed as running one gigantic program stored in its main
memory, in some systems it is necessary to give the appearance of running several
programs simultaneously. This is achieved by having the computer switch rapidly
between running each program in turn. One means by which this is done is with a
special signal called an interrupt which can periodically cause the computer to stop
executing instructions where it was and do something else instead. By remembering
where it was executing prior to the interrupt, the computer can return to that task
later. If several programs are running "at the same time", then the interrupt
generator might be causing several hundred interrupts per second, causing a
program switch each time. Since modern computers typically execute instructions
several orders of magnitude faster than human perception, it may appear that many
programs are running at the same time even though only one is ever executing in
any given instant. This method of multitasking is sometimes termed "time-sharing"
since each program is allocated a "slice" of time in turn.
Before the era of cheap computers, the principle use for multitasking was to allow
many people to share the same computer.
Seemingly, multitasking would cause a computer that is switching between several
programs to run more slowly - in direct proportion to the number of programs it is
running. However, most programs spend much of their time waiting for slow
input/output devices to complete their tasks. If a program is waiting for the user to
click on the mouse or press a key on the keyboard, then it will not take a "time slice"
until the event it is waiting for has occurred. This frees up time for other programs to
execute so that many programs may be run at the same time without unacceptable
speed loss.
20
Multiprocessing
Cray designed many supercomputers that used multiprocessing heavily.
Some computers may divide their work between one or more separate CPUs,
creating a multiprocessing configuration. Traditionally, this technique was utilized
only in large and powerful computers such as supercomputers, mainframe computers
and servers. However, multiprocessor and multi-core (multiple CPUs on a single
integrated circuit) personal and laptop computers have become widely available and
are beginning to see increased usage in lower-end markets as a result.
Supercomputers in particular often have highly unique architectures that differ
significantly from the basic stored-program architecture and from general purpose
computers. They often feature thousands of CPUs, customized high-speed
interconnects, and specialized computing hardware. Such designs tend to be useful
only for specialized tasks due to the large scale of program organization required to
successfully utilize most of a the available resources at once. Supercomputers
usually see usage in large-scale simulation, graphics rendering, and cryptography
applications, as well as with other so-called "embarrassingly parallel" tasks.
21
Networking and the Internet
Visualization of a portion of the routes on the Internet.
Computers have been used to coordinate information in multiple locations since the
1950s, with the U.S. military's SAGE system the first large-scale example of such a
system, which led to a number of special-purpose commercial systems like Sabre.
In the 1970s, computer engineers at research institutions throughout the United
States began to link their computers together using telecommunications technology.
This effort was funded by ARPA (now DARPA), and the computer network that it
produced was called the ARPANET. The technologies that made the Arpanet possible
spread and evolved. In time, the network spread beyond academic and military
institutions and became known as the Internet. The emergence of networking
involved a redefinition of the nature and boundaries of the computer. Computer
operating systems and applications were modified to include the ability to define and
access the resources of other computers on the network, such as peripheral devices,
stored information, and the like, as extensions of the resources of an individual
computer. Initially these facilities were available primarily to people working in high-
tech environments, but in the 1990s the spread of applications like e-mail and the
World Wide Web, combined with the development of cheap, fast networking
technologies like Ethernet and ADSL saw computer networking become almost
22
ubiquitous. In fact, the number of computers that are networked is growing
phenomenally. A very large proportion of personal computers regularly connect to
the Internet to communicate and receive information. "Wireless" networking, often
utilizing mobile phone networks, has meant networking is becoming increasingly
ubiquitous even in mobile computing environments.
Hardware
The term hardware covers all of those parts of a computer that are tangible objects.
Circuits, displays, power supplies, cables, keyboards, printers and mice are all
hardware.
History of computing hardware
Calculators Antikythera mechanism, Difference Engine, Norden bombsight First Generation
(Mechanical/Electromechanical) Programmable Devices
Jacquard loom, Analytical Engine, Harvard Mark I, Z3
Calculators Atanasoff–Berry Computer, IBM 604, UNIVAC 60, UNIVAC 120 Second Generation (Vacuum
Tubes)
Programmable Devices ENIAC, EDSAC, EDVAC, UNIVAC I, IBM 701, IBM 702, IBM 650, Z22
Mainframes IBM 7090, IBM 7080, System/360, BUNCH Third Generation (Discrete
transistors and SSI, MSI, LSI Integrated circuits) Minicomputer
PDP-8, PDP-11, System/32, System/36
Minicomputer VAX, AS/400
4-bit microcomputer Intel 4004, Intel 4040
8-bit microcomputer
Intel 8008, Intel 8080, Motorola 6800, Motorola 6809, MOS Technology 6502, Zilog Z80
16-bit microcomputer 8088, Zilog Z8000, WDC 65816/65802
32-bit microcomputer 80386, Pentium, 68000, ARM architecture
64-bit microcomputer. x86-64, PowerPC, MIPS, SPARC
Fourth Generation (VLSI integrated circuits)
Embedded computer 8048, 8051
23
Personal computer
Desktop computer, Home computer, Laptop computer, Personal digital assistant (PDA), Portable computer, Tablet computer, Wearable computer
Theoretical/experimental
Quantum computer, Chemical computer, DNA computing, Optical computer, Spintronics based computer
Other Hardware Topics
Input Mouse, Keyboard, Joystick, Image scanner
Output Monitor, Printer Peripheral device (Input/output)
Both Floppy disk drive, Hard disk, Optical disc drive, Teleprinter
Short range RS-232, SCSI, PCI, USB Computer busses Long range (Computer
networking) Ethernet, ATM, FDDI
Software
Software refers to parts of the computer which do not have a material form, such
as programs, data, protocols, etc. When software is stored in hardware that cannot
easily be modified (such as BIOS ROM in an IBM PC compatible), it is sometimes
called "firmware" to indicate that it falls into an uncertain area somewhere between
hardware and software.
Computer software
Unix/BSD UNIX System V, AIX, HP-UX, Solaris (SunOS), IRIX, List of BSD operating systems
GNU/Linux List of Linux distributions, Comparison of Linux distributions
Microsoft Windows
Windows 95, Windows 98, Windows NT, Windows XP, Windows Vista, Windows CE
DOS 86-DOS (QDOS), PC-DOS, MS-DOS, FreeDOS
Operating system
Mac OS Mac OS classic, Mac OS X
24
Embedded and real-time
List of embedded operating systems
Experimental Amoeba, Oberon/Bluebottle, Plan 9 from Bell Labs
Multimedia DirectX, OpenGL, OpenAL Library Programming
library C standard library, Standard template library
Protocol TCP/IP, Kermit, FTP, HTTP, SMTP Data
File format HTML, XML, JPEG, MPEG, PNG
Graphical user interface (WIMP)
Microsoft Windows, GNOME, KDE, QNX Photon, CDE, GEM
Text user interface
Command line interface, shells User interface
Other
Office suite
Word processing, Desktop publishing, Presentation program, Database management system, Scheduling & Time management, Spreadsheet, Accounting software
Internet Access Browser, E-mail client, Web server, Mail transfer agent, Instant messaging
Design and manufacturing
Computer-aided design, Computer-aided manufacturing, Plant management, Robotic manufacturing, Supply chain management
Graphics Raster graphics editor, Vector graphics editor, 3D modeler, Animation editor, 3D computer graphics, Video editing, Image processing
Audio Digital audio editor, Audio playback, Mixing, Audio synthesis, Computer music
Software Engineering
Compiler, Assembler, Interpreter, Debugger, Text Editor, Integrated development environment, Performance analysis, Revision control, Software configuration management
Educational Edutainment, Educational game, Serious game, Flight simulator
Games Strategy, Arcade, Puzzle, Simulation, First-person shooter, Platform, Massively multiplayer, Interactive fiction
Application
Misc Artificial intelligence, Antivirus software, Malware scanner, Installer/Package management systems, File manager
25
Programming languages
Programming languages provide various ways of specifying programs for computers
to run. Unlike natural languages, programming languages are designed to permit no
ambiguity and to be concise. They are purely written languages and are often
difficult to read aloud. They are generally either translated into machine language by
a compiler or an assembler before being run, or translated directly at run time by an
interpreter. Sometimes programs are executed by a hybrid method of the two
techniques. There are thousands of different programming languages—some
intended to be general purpose, others useful only for highly specialized applications.
Programming Languages
Lists of programming languages
Timeline of programming languages, Categorical list of programming languages, Generational list of programming languages, Alphabetical list of programming languages, Non-English-based programming languages
Commonly used Assembly languages
ARM, MIPS, x86
Commonly used High level languages
BASIC, C, C++, C#, COBOL, Fortran, Java, Lisp, Pascal
Commonly used Scripting languages
Bourne script, JavaScript, Python, Ruby, PHP, Perl
Professions and organizations
As the use of computers has spread throughout society, there are an increasing
number of careers involving computers. Following the theme of hardware, software
and firmware, the brains of people who work in the industry are sometimes known
irreverently as wetware or "meatware".
Computer-related professions
Hardware-related
Electrical engineering, Electronics engineering, Computer engineering, Telecommunications engineering, Optical engineering, Nanoscale engineering
Software-related
Human-computer interaction, Information technology, Software engineering, Scientific computing, Web design, Desktop publishing
26
The need for computers to work well together and to be able to exchange
information has spawned the need for many standards organizations, clubs and
societies of both a formal and informal nature.
Organizations
Standards groups ANSI, IEC, IEEE, IETF, ISO, W3C
Professional Societies ACM, ACM Special Interest Groups, IET, IFIP
Free/Open source software groups
Free Software Foundation, Mozilla Foundation, Apache Software Foundation
Notes
1. ^ In 1946, ENIAC consumed an estimated 174 kW. By comparison, a typical
personal computer may use around 400 W; over four hundred times less.
(Kempf 1961)
2. ^ Early computers such as Colossus and ENIAC were able to process between
5 and 100 operations per second. A modern "commodity" microprocessor (as
of 2007) can process billions of operations per second, and many of these
operations are more complicated and useful than early computer operations.
3. ^ The Analytical Engine should not be confused with Babbage's difference
engine which was a non-programmable mechanical calculator.
4. ^ This program was written similarly to those for the PDP-11 minicomputer
and shows some typical things a computer can do. All the text after the
semicolons are comments for the benefit of human readers. These have no
significance to the computer and are ignored. (Digital Equipment Corporation
1972)
5. ^ Attempts are often made to create programs that can overcome this
fundamental limitation of computers. Software that mimics learning and
adaptation is part of artificial intelligence.
6. ^ It is not universally true that bugs are solely due to programmer oversight.
Computer hardware may fail or may itself have a fundamental problem that
produces unexpected results in certain situations. For instance, the Pentium
FDIV bug caused some Intel microprocessors in the early 1990s to produce
inaccurate results for certain floating point division operations. This was
caused by a flaw in the microprocessor design and resulted in a partial recall
of the affected devices.
27
7. ^ Even some later computers were commonly programmed directly in
machine code. Some minicomputers like the DEC PDP-8 could be
programmed directly from a panel of switches. However, this method was
usually used only as part of the booting process. Most modern computers
boot entirely automatically by reading a boot program from some non-volatile
memory.
8. ^ However, there is sometimes some form of machine language compatibility
between different computers. An x86-64 compatible microprocessor like the
AMD Athlon 64 is able to run most of the same programs that an Intel Core 2
microprocessor can, as well as programs designed for earlier microprocessors
like the Intel Pentiums and Intel 80486. This contrasts with very early
commercial computers, which were often one-of-a-kind and totally
incompatible with other computers.
9. ^ High level languages are also often interpreted rather than compiled.
Interpreted languages are translated into machine code on the fly by another
program called an interpreter.
10. ̂ Although this is a simple program, it contains a software bug. If the traffic
signal is showing red when someone switches the "flash red" switch, it will
cycle through green once more before starting to flash red as instructed. This
bug is quite easy to fix by changing the program to repeatedly test the switch
throughout each "wait" period—but writing large programs that have no bugs
is exceedingly difficult.
11. ̂ The control unit's rule in interpreting instructions has varied somewhat in
the past. While the control unit is solely responsible for instruction
interpretation in most modern computers, this is not always the case. Many
computers include some instructions that may only be partially interpreted by
the control system and partially interpreted by another device. This is
especially the case with specialized computing hardware that may be partially
self-contained. For example, EDVAC, the first modern stored program
computer to be designed, used a central control unit that only interpreted
four instructions. All of the arithmetic-related instructions were passed on to
its arithmetic unit and further decoded there.
12. ̂ Instructions often occupy more than one memory address, so the program
counters usually increases by the number of memory locations required to
store one instruction.
28
13. ̂ Flash memory also may only be rewritten a limited number of times before
wearing out, making it less useful for heavy random access usage. (Verma
1988)
14. ̂ However, it is also very common to construct supercomputers out of many
pieces of cheap commodity hardware; usually individual computers connected
by networks. These so-called computer clusters can often provide
supercomputer performance at a much lower cost than customized designs.
While custom architectures are still used for most of the most powerful
supercomputers, there has been a proliferation of cluster computers in recent
years. (TOP500 2006)
15. ̂ Most major 64-bit instruction set architectures are extensions of earlier
designs. All of the architectures listed in this table existed in 32-bit forms
before their 64-bit incarnations were introduced.
29
Operating system
An operating system (OS) is the software that manages the sharing of the
resources of a computer and provides programmers with an interface used to access
those resources. An operating system processes system data and user input, and
responds by allocating and managing tasks and internal system resources as a
service to users and programs of the system. At the foundation of all system
software, an operating system performs basic tasks such as controlling and
allocating memory, prioritizing system requests, controlling input and output devices,
facilitating networking and managing file systems. Most operating systems come with
an application that provides a user interface for managing the operating system,
such as a command line interpreter or graphical user interface. The operating system
forms a platform for other system software and for application software.
The most commonly-used contemporary desktop and laptop (notebook) OS is
Microsoft Windows. More powerful servers often employ Linux, FreeBSD, and other
Unix-like systems. However, these operating systems, especially Mac OS X, are also
used on personal computers.
Services
Process management
Every program running on a computer, be it a service or an application, is a process.
As long as a von Neumann architecture is used to build computers, only one process
per CPU can be run at a time. Older microcomputer OSes such as MS-DOS did not
attempt to bypass this limit, with the exception of interrupt processing, and only one
process could be run under them (although DOS itself featured TSR as a very partial
and not too easy to use solution).
Most operating systems enable concurrent execution of many processes and
programs at once via multitasking, even with one CPU. The mechanism was used in
mainframes since the early 1960s, but in the personal computers it became available
in 1990s. Process management is an operating system's way of dealing with running
30
those multiple processes. On the most fundamental of computers (those containing
one processor with one core) multitasking is done by simply switching processes
quickly. Depending on the operating system, as more processes run, either each
time slice will become smaller or there will be a longer delay before each process is
given a chance to run. Process management involves computing and distributing CPU
time as well as other resources. Most operating systems allow a process to be
assigned a priority which affects its allocation of CPU time. Interactive operating
systems also employ some level of feedback in which the task with which the user is
working receives higher priority. Interrupt driven processes will normally run at a
very high priority. In many systems there is a background process, such as the
System Idle Process in Windows, which will run when no other process is waiting for
the CPU.
Memory management
Current computer architectures arrange the computer's memory in a hierarchical
manner, starting from the fastest registers, CPU cache, random access memory and
disk storage. An operating system's memory manager coordinates the use of these
various types of memory by tracking which one is available, which is to be allocated
or deallocated and how to move data between them. This activity, usually referred to
as virtual memory management, increases the amount of memory available for each
process by making the disk storage seem like main memory. There is a speed
penalty associated with using disks or other slower storage as memory – if running
processes require significantly more RAM than is available, the system may start
thrashing. This can happen either because one process requires a large amount of
RAM or because two or more processes compete for a larger amount of memory than
is available. This then leads to constant transfer of each process's data to slower
storage.
Another important part of memory management is managing virtual addresses. If
multiple processes are in memory at once, they must be prevented from interfering
with each other's memory (unless there is an explicit request to utilise shared
memory). This is achieved by having separate address spaces. Each process sees the
whole virtual address space, typically from address 0 up to the maximum size of
virtual memory, as uniquely assigned to it. The operating system maintains a page
table that match virtual addresses to physical addresses. These memory allocations
31
are tracked so that when a process terminates, all memory used by that process can
be made available for other processes.
The operating system can also write inactive memory pages to secondary storage.
This process is called "paging" or "swapping" – the terminology varies between
operating systems.
It is also typical for operating systems to employ otherwise unused physical memory
as a page cache; requests for data from a slower device can be retained in memory
to improve performance. The operating system can also pre-load the in-memory
cache with data that may be requested by the user in the near future; SuperFetch is
an example of this.
Disk and file systems
Generally, operating systems include support for file systems.
Modern file systems comprise a hierarchy of directories. While the idea is
conceptually similar across all general-purpose file systems, some differences in
implementation exist. Two noticeable examples of this are the character used to
separate directories, and case sensitivity.
Unix demarcates its path components with a slash (/), a convention followed by
operating systems that emulated it or at least its concept of hierarchical directories,
such as Linux, Amiga OS and Mac OS X. MS-DOS also emulated this feature, but had
already also adopted the CP/M convention of using slashes for additional options to
commands, so instead used the backslash (\) as its component separator. Microsoft
Windows continues with this convention; Japanese editions of Windows use ¥, and
Korean editions use ₩. Prior to Mac OS X, versions of Mac OS use a colon (:) for a
path separator. RISC OS uses a period (.).
Unix and Unix-like operating systems allow for any character in file names other than
the slash and NUL characters (including line feed (LF) and other control characters).
Unix file names are case sensitive, which allows multiple files to be created with
names that differ only in case. By contrast, Microsoft Windows file names are not
case sensitive by default. Windows also has a larger set of punctuation characters
that are not allowed in file names.
32
File systems may provide journaling, which provides safe recovery in the event of a
system crash. A journaled file system writes information twice: first to the journal,
which is a log of file system operations, then to its proper place in the ordinary file
system. In the event of a crash, the system can recover to a consistent state by
replaying a portion of the journal. In contrast, non-journaled file systems typically
need to be examined in their entirety by a utility such as fsck or chkdsk. Soft
updates is an alternative to journalling that avoids the redundant writes by carefully
ordering the update operations. Log-structured file systems and ZFS also differ from
traditional journaled file systems in that they avoid inconsistencies by always writing
new copies of the data, eschewing in-place updates.
Many Linux distributions support some or all of ext2, ext3, ReiserFS, Reiser4, GFS,
GFS2, OCFS, OCFS2, and NILFS. Linux also has full support for XFS and JFS, along
with the FAT file systems, and NTFS.
Microsoft Windows includes support for FAT12, FAT16, FAT32, and NTFS. The NTFS
file system is the most efficient and reliable of the four Windows file systems, and as
of Windows Vista, is the only file system which the operating system can be installed
on. Windows Embedded CE 6.0 introduced ExFAT, a file system suitable for flash
drives.
Mac OS X supports HFS+ with journaling as its primary file system. It is derived from
the Hierarchical File System of the earlier Mac OS. Mac OS X has facilities to read
and write FAT16, FAT32, NTFS and other file systems, but cannot be installed to
them.
Common to all these (and other) operating systems is support for file systems
typically found on removable media. FAT12 is the file system most commonly found
on floppy discs. ISO 9660 and Universal Disk Format are two common formats that
target Compact Discs and DVDs, respectively. Mount Rainier is a newer extension to
UDF supported by Linux 2.6 kernels and Windows Vista that facilitates rewriting to
DVDs in the same fashion as has been possible with floppy disks.
Networking
Most current operating systems are capable of using the TCP/IP networking
protocols. This means that computers running dissimilar operating systems can
33
participate in a common network for sharing resources such as computing, files,
printers, and scanners using either wired or wireless connections.
Many operating systems also support one or more vendor-specific legacy networking
protocols as well, for example, SNA on IBM systems, DECnet on systems from Digital
Equipment Corporation, and Microsoft-specific protocols on Windows. Specific
protocols for specific tasks may also be supported such as NFS for file access.
Security
Many operating systems include some level of security. Security is based on the two
ideas that:
The operating system provides access to a number of resources, directly or
indirectly, such as files on a local disk, privileged system calls, personal
information about users, and the services offered by the programs running on
the system;
The operating system is capable of distinguishing between some requesters of
these resources who are authorized (allowed) to access the resource, and
others who are not authorized (forbidden). While some systems may simply
distinguish between "privileged" and "non-privileged", systems commonly
have a form of requester identity, such as a user name. Requesters, in turn,
divide into two categories:
o Internal security: an already running program. On some systems, a
program once it is running has no limitations, but commonly the
program has an identity which it keeps and is used to check all of its
requests for resources.
o External security: a new request from outside the computer, such as a
login at a connected console or some kind of network connection. To
establish identity there may be a process of authentication. Often a
username must be quoted, and each username may have a password.
Other methods of authentication, such as magnetic cards or biometric
data, might be used instead. In some cases, especially connections
from the network, resources may be accessed with no authentication
at all.
34
In addition to the allow/disallow model of security, a system with a high level of
security will also offer auditing options. These would allow tracking of requests for
access to resources (such as, "who has been reading this file?").
Security of operating systems has long been a concern because of highly sensitive
data held on computers, both of a commercial and military nature. The United States
Government Department of Defense (DoD) created the Trusted Computer System
Evaluation Criteria (TCSEC) which is a standard that sets basic requirements for
assessing the effectiveness of security. This became of vital importance to operating
system makers, because the TCSEC was used to evaluate, classify and select
computer systems being considered for the processing, storage and retrieval of
sensitive or classified information.
In December 2007, Apple Inc. released a security update for Mac OS X and Mac OS X
Server to address vulnerabilities in Flash and Shockwave, products of Adobe
Systems, and Tar, a GNU utility. Among the problems addressed are the quite
ordinary ability to "execute arbitrary code" and to "gain access to sensitive
information" or "cause a denial of service". More surprisingly, the update addressed
attackers abilitiy to "surreptitiously initiate a video conference".
Internal security
Internal security can be thought of as protecting the computer's resources from the
programs concurrently running on the system. Most operating systems set programs
running natively on the computer's processor, so the problem arises of how to stop
these programs doing the same task and having the same privileges as the operating
system (which is after all just a program too). Processors used for general purpose
operating systems generally have a hardware concept of privilege. Generally less
privileged programs are automatically blocked from using certain hardware
instructions, such as those to read or write from external devices like disks. Instead,
they have to ask the privileged program (operating system kernel) to read or write.
The operating system therefore gets the chance to check the program's identity and
allow or refuse the request.
An alternative strategy, and the only sandbox strategy available in systems that do
not meet the Popek and Goldberg virtualization requirements, is the operating
35
system not running user programs as native code, but instead either emulates a
processor or provides a host for a p-code based system such as Java.
Internal security is especially relevant for multi-user systems; it allows each user of
the system to have private files that the other users cannot tamper with or read.
Internal security is also vital if auditing is to be of any use, since a program can
potentially bypass the operating system, inclusive of bypassing auditing.
External security
Typically an operating system offers (or hosts) various services to other network
computers and users. These services are usually provided through ports or
numbered access points beyond the operating systems network address. Services
include offerings such as file sharing, print services, email, web sites, and file
transfer protocols (FTP), most of which can have compromised security.
At the front line of security are hardware devices known as firewalls or intrusion
detection/prevention systems. At the operating system level, there are a number of
software firewalls available, as well as intrusion detection/prevention systems. Most
modern operating systems include a software firewall, which is enabled by default. A
software firewall can be configured to allow or deny network traffic to or from a
service or application running on the operating system. Therefore, one can install
and be running an insecure service, such as Telnet or FTP, and not have to be
threatened by a security breach because the firewall would deny all traffic trying to
connect to the service on that port.
Graphical user interfaces
Today, most modern operating systems contain Graphical User Interfaces. A few
older operating systems tightly integrated the GUI into the kernel—for example, in
the original implementations of Microsoft Windows and Mac OS, the graphical
subsystem was actually part of the kernel. More modern operating systems are
modular, separating the graphics subsystem from the kernel (as is now done in
Linux, and Mac OS X).
36
Many operating systems allow the user to install or create any user interface they
desire. The X Window System in conjunction with GNOME or KDE is a commonly
found setup on most Unix and Unix-like (BSD, Linux, Minix) systems.
Graphical user interfaces evolve over time. For example, Windows has modified its
user interface almost every time a new major version of Windows is released, and
the Mac OS GUI changed dramatically with the introduction of Mac OS X in 2001.
Device drivers
A device driver is a specific type of computer software developed to allow interaction
with hardware devices. Typically this constitutes an interface for communicating with
the device, through the specific computer bus or communications subsystem that the
hardware is connected to, providing commands to and/or receiving data from the
device, and on the other end, the requisite interfaces to the operating system and
software applications. It is a specialized hardware-dependent computer program
which is also operating system specific that enables another program, typically an
operating system or applications software package or computer program running
under the operating system kernel, to interact transparently with a hardware device,
and usually provides the requisite interrupt handling necessary for any necessary
asynchronous time-dependent hardware interfacing needs.
The key design goal of device drivers is abstraction. Every model of hardware (even
within the same class of device) is different. Newer models also are released by
manufacturers that provide more reliable or better performance and these newer
models are often controlled differently. Computers and their operating systems
cannot be expected to know how to control every device, both now and in the future.
To solve this problem, OSes essentially dictate how every type of device should be
controlled. The function of the device driver is then to translate these OS mandated
function calls into device specific calls. In theory a new device, which is controlled in
a new manner, should function correctly if a suitable driver is available. This new
driver will ensure that the device appears to operate as usual from the operating
systems' point of view for any person.
37
History
The first computers did not have operating systems. By the early 1960s, commercial
computer vendors were supplying quite extensive tools for streamlining the
development, scheduling, and execution of jobs on batch processing systems.
Examples were produced by UNIVAC and Control Data Corporation, amongst others.
Mainframes
Through the 1960s, many major features were pioneered in the field of operating
systems. The development of the IBM System/360 produced a family of mainframe
computers available in widely differing capacities and price points, for which a single
operating system OS/360 was planned (rather than developing ad-hoc programs for
every individual model). This concept of a single OS spanning an entire product line
was crucial for the success of System/360 and, in fact, IBM's current mainframe
operating systems are distant descendants of this original system; applications
written for the OS/360 can still be run on modern machines. OS/360 also contained
another important advance: the development of the hard disk permanent storage
device (which IBM called DASD).
Control Data Corporation developed the SCOPE operating system in the 1960s, for
batch processing. In cooperation with the University of Minnesota, the KRONOS and
later the NOS operating systems were developed during the 1970s, which supported
simultaneous batch and timesharing use. Like many commercial timesharing
systems, its interface was an extension of the Dartmouth BASIC operating systems,
one of the pioneering efforts in timesharing and programming languages. In the late
1970s, Control Data and the University of Illinois developed the PLATO operating
system, which used plasma panel displays and long-distance time sharing networks.
Plato was remarkably innovative for its time, featuring real-time chat, and multi-user
graphical games.
Burroughs Corporation introduced the B5000 in 1961 with the MCP, (Master Control
Program) operating system. The B5000 was a stack machine designed to exclusively
support high-level languages with no machine language or assembler and indeed the
MCP was the first OS to be written exclusively in a high-level language (ESPOL, a
dialect of ALGOL). MCP also introduced many other ground-breaking innovations,
38
such as being the first commercial implementation of virtual memory. MCP is still in
use today in the Unisys ClearPath/MCP line of computers.
UNIVAC, the first commercial computer manufacturer, produced a series of EXEC
operating systems. Like all early main-frame systems, this was a batch-oriented
system that managed magnetic drums, disks, card readers and line printers. In the
1970s, UNIVAC produced the Real-Time Basic (RTB) system to support large-scale
time sharing, also patterned after the Dartmouth BASIC system.
General Electric and MIT developed General Electric Comprehensive Operating
Supervisor (GECOS), which introduced the concept of ringed security privilege levels.
After acquisition by Honeywell it was renamed to General Comprehensive Operating
System (GCOS).
Digital Equipment Corporation developed many operating systems for its various
computer lines, including TOPS-10 and TOPS-20 time sharing systems for the 36-bit
PDP-10 class systems. Prior to the widespread use of UNIX, TOPS-10 was a
particularly popular system in universities, and in the early ARPANET community.
In the late 1960s through the late 1970s, several hardware capabilities evolved that
allowed similar or ported software to run on more than one system. Early systems
had utilized microprogramming to implement features on their systems in order to
permit different underlying architecture to appear to be the same as others in a
series. In fact most 360's after the 360/40 (except the 360/165 and 360/168) were
microprogrammed implementations. But soon other means of achieving application
compatibility were proven to be more significant.
The enormous investment in software for these systems made since 1960s caused
most of the original computer manufacturers to continue to develop compatible
operating systems along with the hardware. The notable supported mainframe
operating systems include:
Burroughs MCP -- B5000,1961 to Unisys Clearpath/MCP, present.
IBM OS/360 -- IBM System/360, 1964 to IBM z/OS, present.
IBM CP-67 -- IBM System/360, 1967 to IBM z/VM, present.
UNIVAC EXEC 8 -- UNIVAC 1108, 1964, to Unisys Clearpath IX, present.
39
Microcomputers
The first microcomputers did not have the capacity or need for the elaborate
operating systems that had been developed for mainframes and minis; minimalistic
operating systems were developed, often loaded from ROM and known as Monitors.
One notable early disk-based operating system was CP/M, which was supported on
many early microcomputers and was closely imitated in MS-DOS, which became
wildly popular as the operating system chosen for the IBM PC (IBM's version of it
was called IBM-DOS or PC-DOS), its successors making Microsoft one of the world's
most profitable companies. In the 80's Apple Computer Inc. (now Apple Inc.)
abandoned its popular Apple II series of microcomputers to introduce the Apple
Macintosh computer with the an innovative Graphical User Interface (GUI) to the Mac
OS.
The introduction of the Intel 80386 CPU chip with 32-bit architecture and paging
capabilities, provided personal computers with the ability to run multitasking
operating systems like those of earlier minicomputers and mainframes. Microsoft's
responded to this progress by hiring Dave Cutler, who had developed the VMS
operating system for Digital Equipment Corporation. He would lead the development
of the Windows NT operating system, which continues to serve as the basis for
Microsoft's operating systems line. Steve Jobs, a co-founder of Apple, started NeXT
Computer Inc., which developed the Unix-like NEXTSTEP operating system.
NEXTSTEP would later be acquired by Apple Inc. and used, along with code from
FreeBSD as the core of Mac OS X.
Minix, an academic teaching tool which could be run on early PCs, would inspire
another reimplementation of Unix, called Linux. Started by computer student Linus
Torvalds with cooperation from volunteers over the internet, developed a kernel
which was combined with the tools from the GNU Project. The Berkeley Software
Distribution, known as BSD, is the UNIX derivative distributed by the University of
California, Berkeley, starting in the 1970s. Freely distributed and ported to many
minicomputers, it eventually also gained a following for use on PCs, mainly as
FreeBSD, NetBSD and OpenBSD.
40
Unix-like operating systems
A customized KDE desktop running under Linux.
The Unix-like family is a diverse group of operating systems, with several major sub-
categories including System V, BSD, and Linux. The name "UNIX" is a trademark of
The Open Group which licenses it for use with any operating system that has been
shown to conform to their definitions. "Unix-like" is commonly used to refer to the
large set of operating systems which resemble the original Unix.
Unix systems run on a wide variety of machine architectures. They are used heavily
as server systems in business, as well as workstations in academic and engineering
environments. Free software Unix variants, such as Linux and BSD, are popular in
these areas. The market share for Linux is divided between many different
distributions. Enterprise class distributions by Red Hat or SuSe are used by
corporations, but some home users may use those products. Historically home users
typically installed a distribution themselves, but in 2007 Dell began to offer the
Ubuntu Linux distribution on home PCs. Linux on the desktop is also popular in the
developer and hobbyist operating system development communities.
Market share statistics for freely available operating systems are usually inaccurate
since most free operating systems are not purchased, making usage under-
represented. On the other hand, market share statistics based on total downloads of
free operating systems are often inflated, as there is no economic disincentive to
acquire multiple operating systems so users can download multiple, test them, and
decide which they like best.
Some Unix variants like HP's HP-UX and IBM's AIX are designed to run only on that
vendor's hardware. Others, such as Solaris, can run on multiple types of hardware,
including x86 servers and PCs. Apple's Mac OS X, a hybrid kernel-based BSD variant
41
derived from NeXTSTEP, Mach, and FreeBSD, has replaced Apple's earlier (non-Unix)
Mac OS.
Unix interoperability was sought by establishing the POSIX standard. The POSIX
standard can be applied to any operating system, although it was originally created
for various unix varients.
Open source
Over the past several years, the trend in the Unix and Unix-like space has been to
freely provide the software programming code. This process, is called open source,
although there was initially confusion as "open" previously meant sharing code and
standards only with the representative of select companies.
Ken Thompson, Dennis Ritchie and Douglas McIlroy at Bell Labs designed and
developed the C programming language to build the operating system Unix.
Programmers at Bell Labs went on to develop Plan 9 and Inferno, which were
engineered for modern distributed environments. They had graphics built-in, unlike
Unix counterparts that added it to the design later. Plan 9 did not become popular
because, unlike many Unix distributions, it was not originally free. It has since been
released under Free Software and Open Source Lucent Public License, and has an
expanding community of developers. Inferno was sold to Vita Nuova Holdings and
has been released under a GPL/MIT license.
Mac OS X
Mac OS X is a line of proprietary, graphical operating systems developed, marketed,
and sold by Apple Inc., the latest of which is pre-loaded on all currently shipping
Macintosh computers. Mac OS X is the successor to the original Mac OS, which had
been Apple's primary operating system since 1984. Unlike its predecessor, Mac OS X
is a UNIX operating system built on technology that had been developed at NeXT
through the second half of the 1980s and up until Apple purchased the company in
early 1997.
The operating system was first released in 1999 as Mac OS X Server 1.0, with a
desktop-oriented version (Mac OS X v10.0) following in March 2001. Since then, five
more distinct "end-user" and "server" editions of Mac OS X have been released, the
42
most recent being Mac OS X v10.5, which was first made available in October 2007.
Releases of Mac OS X are named after big cats; Mac OS X v10.5 is usually referred
to by Apple and users as "Leopard".
The server edition, Mac OS X Server, is architecturally identical to its desktop
counterpart but usually runs on Apple's line of Macintosh server hardware. Mac OS X
Server includes workgroup management and administration software tools that
provide simplified access to key network services, including a mail transfer agent, a
Samba server, an LDAP server, a domain name server, and others.
Microsoft Windows
The Microsoft Windows family of operating systems originated as an add-on to the
older MS-DOS environment for the IBM PC. Modern versions are based on the newer
Windows NT core that was originally intended for OS/2 and borrowed from VMS.
Windows runs on x86, x86-64 and Itanium processors. Earlier versions also ran on
the DEC Alpha, MIPS, Fairchild (later Intergraph) Clipper and PowerPC architectures
(some work was done to port it to the SPARC architecture).
As of September 2007, Microsoft Windows holds a large amount of the worldwide
desktop market share. Windows is also used on servers, supporting applications such
as web servers and database servers. In recent years, Microsoft has spent significant
marketing and research & development money to demonstrate that Windows is
capable of running any enterprise application, which has resulted in consistent
price/performance records (see the TPC) and significant acceptance in the enterprise
market.
The most widely used version of the Microsoft Windows family is Windows XP,
released on October 25, 2001.
In November 2006, after more than five years of development work, Microsoft
released Windows Vista, a major new version of Microsoft Windows which contains a
large number of new features and architectural changes. Chief amongst these are a
new user interface and visual style called Windows Aero, a number of new security
features such as User Account Control, and new multimedia applications such as
Windows DVD Maker.
43
Microsoft has announced a new version codenamed Windows 7 will be released in
2010 or later.
Embedded systems
Embedded systems use a variety of dedicated operating systems. In some cases, the
"operating system" software is directly linked to the application to produce a
monolithic special-purpose program. In the simplest embedded systems, there is no
distinction between the OS and the application. Embedded systems that have certain
time requirements are known as real-time operating systems.
Operating systems such as VxWorks, eCos, and Palm OS, are unrelated to Unix and
Windows. Windows CE is descendant of Windows, and several embedded BSD and
Linux distributions exist.
Hobby operating system development
Operating system development, or OSDev for short, as a hobby has a large cult
following. As such, operating systems, such as Linux, have derived from hobby
operating system projects. The design and implementation of an operating system
requires skill and determination, and the term can cover anything from a basic "Hello
World" boot loader to a fully featured kernel. One classical example of this is the
Minix Operating System -- an OS that was designed as a teaching tool but was
heavily used by hobbyists before Linux eclipsed it in popularity.
Other
Older operating systems which are still used in niche markets include OS/2 from
IBM; Mac OS, the non-Unix precursor to Apple's Mac OS X; BeOS; XTS-300. Some,
most notably AmigaOS and RISC OS, continue to be developed as minority platforms
for enthusiast communities and specialist applications. OpenVMS formerly from DEC,
is still under active development by Hewlett-Packard.
Research and development of new operating systems continues. GNU Hurd is
designed to be backwards compatible with Unix, but with enhanced functionality and
a microkernel architecture. Singularity is a project at Microsoft Research to develop
44
an operating system with better memory protection based on the .Net managed code
model.
45
Computer networking
Computer networking is the engineering discipline concerned with communication
between computer systems or devices. Networking, routers, routing protocols, and
networking over the public Internet have their specifications defined in documents
called RFCs. Computer networking is sometimes considered a sub-discipline of
telecommunications, computer science, information technology and/or computer
engineering. Computer networks rely heavily upon the theoretical and practical
application of these scientific and engineering disciplines.
A computer network is any set of computers or devices connected to each other with
the ability to exchange data. Examples of networks are:
local area network (LAN), which is usually a small network constrained to a
small geographic area.
wide area network (WAN) that is usually a larger network that covers a large
geographic area.
wireless LANs and WANs (WLAN & WWAN) is the wireless equivalent of the
LAN and WAN
All networks are interconnected to allow communication with a varity of different
kinds of media, which including twisted-pair copper wire cable, coaxial cable, optical
fiber, and various wireless technologies. The devices can be separated by a few
meters (e.g. via Bluetooth) or nearly unlimited distances (e.g. via the
interconnections of the Internet).
Views of networks
Users and network administrators often have different views of their networks.
Often, users that share printers and some servers form a workgroup, which usually
46
means they are in the same geographic location and are on the same LAN. A
community of interest has less of a connotation of being in a local area, and should
be thought of as a set of arbitrarily located users who share a set of servers, and
possibly also communicate via peer-to-peer technologies.
Network administrators see networks from both physical and logical perspectives.
The physical perspective involves geographic locations, physical cabling, and the
network elements (e.g., routers, bridges and application layer gateways that
interconnect the physical media. Logical networks, called, in the TCP/IP architecture,
subnets , map onto one or more physical media. For example, a common practice in
a campus of buildings is to make a set of LAN cables in each building appear to be a
common subnet, using virtual LAN (VLAN) technology.
Both users and administrators will be aware, to varying extents, of the trust and
scope characteristics of a network. Again using TCP/IP architectural terminology, an
intranet is a community of interest under private administration usually by an
enterprise, and is only accessible by authorized users (e.g. employees) (RFC 2547).
Intranets do not have to be connected to the Internet, but generally have a limited
connection. An extranet is an extension of an intranet that allows secure
communications to users outside of the intranet (e.g. business partners,
customers)RFC 3547.
Informally, the Internet is the set of users, enterprises, and content providers that
are interconnected by Internet Service Providers (ISP). From an engineering
standpoint, the Internet is the set of subnets, and aggregates of subnets, which
share the registered IP address space and exchange information about the
reachability of those IP addresses using the Border Gateway Protocol. Typically, the
human-readable names of servers are translated to IP addresses, transparently to
users, via the directory function of the Domain Name System (DNS).
Over the Internet, there can be business-to-business (B2B), business-to-consumer
(B2C) and consumer-to-consumer (C2C) communications. Especially when money or
sensitive information is exchanged, the communications are apt to be secured by
some form of communications security mechanism. Intranets and extranets can be
securely superimposed onto the Internet, without any access by general Internet
users, using secure Virtual Private Network (VPN) technology.
47
History
Before the advent of computer networks that were based upon some type of
telecommunications system, communication between calculation machines and early
computers was performed by human users by carrying instructions between them.
Many of the social behavior seen in today's Internet was demonstrably present in
nineteenth-century telegraph networks, and arguably in even earlier networks using
visual signals.
In September 1940 George Stibitz used a teletype machine to send instructions for a
problem set from his Model K at Dartmouth College in New Hampshire to his
Complex Number Calculator in New York and received results back by the same
means. Linking output systems like teletypes to computers was an interest at the
Advanced Research Projects Agency (ARPA) when, in 1962, J.C.R. Licklider was hired
and developed a working group he called the "Intergalactic Network", a precursor to
the ARPANet.
In 1964, researchers at Dartmouth developed the Dartmouth Time Sharing System
for distributed users of large computer systems. The same year, at MIT, a research
group supported by General Electric and Bell Labs used a computer (DEC's PDP-8) to
route and manage telephone connections.
Throughout the 1960s Leonard Kleinrock, Paul Baran and Donald Davies
independently conceptualized and developed network systems which used datagrams
or packets that could be used in a packet switched network between computer
systems.
The first widely used PSTN switch that used true computer control was the Western
Electric 1ESS switch, introduced in 1965.
In 1969 the University of California at Los Angeles, SRI (in Stanford), University of
California at Santa Barbara, and the University of Utah were connected as the
beginning of the ARPANet network using 50 kbit/s circuits. Commercial services
using X.25, an alternative architecture to the TCP/IP suite, were deployed in 1972.
Computer networks, and the technologies needed to connect and communicate
through and between them, continue to drive computer hardware, software, and
48
peripherals industries. This expansion is mirrored by growth in the numbers and
types of users of networks from the researcher to the home user.
Today, computer networks are the core of modern communication. For example, all
modern aspects of the Public Switched Telephone Network (PSTN) are computer-
controlled, and telephony increasingly runs over the Internet Protocol, although not
necessarily the public Internet. The scope of communication has increased
significantly in the past decade and this boom in communications would not have
been possible without the progressively advancing computer network.
Networking methods
Networking is a complex part of computing that makes up most of the IT Industry.
Without networks, almost all communication in the world would cease to happen. It
is because of networking that telephones, televisions, the internet, etc. work.
One way to categorize computer networks are by their geographic scope, although
many real-world networks interconnect Local Area Networks (LAN) via Wide Area
Networks (WAN). These two (broad) types are:
Local area network (LAN)
A local area network is a network that spans a relatively small space and provides
services to a small number of people. Depending on the number of people that use a
Local Area Network, a peer-to-peer or client-server method of networking may be
used. A peer-to-peer network is where each client shares their resources with other
workstations in the network. Examples of peer-to-peer networks are: Small office
networks where resource use is minimal and a home network. A client-server
network is where every client is connected to the server and each other. Client-
server networks use servers in different capacities. These can be classified into two
types: Single-service servers, where the server performs one task such as file
server, print server, etc.; while other servers can not only perform in the capacity of
file servers and print servers, but they also conduct calculations and use these to
provide information to clients (Web/Intranet Server). Computers are linked via
Ethernet Cable, can be joined either directly (one computer to another), or via a
network hub that allows multiple connections.
49
Historically, LANs have featured much higher speeds than WANs. This is not
necessarily the case when the WAN technology appears as Metro Ethernet,
implemented over optical transmission systems.
Wide area network (WAN)
A wide area network is a network where a wide variety of resources are deployed
across a large domestic area or internationally. An example of this is a multinational
business that uses a WAN to interconnect their offices in different countries. The
largest and best example of a WAN is the Internet, which is a network comprised of
many smaller networks. The Internet is considered the largest network in the world.
The PSTN (Public Switched Telephone Network) also is an extremely large network
that is converging to use Internet technologies, although not necessarily through the
public Internet.
A Wide Area Network involves communication through the use of a wide range of
different technologies. These technologies include Point-to-Point WANs such as Point-
to-Point Protocol (PPP) and High-Level Data Link Control (HDLC), Frame Relay, ATM
(Asynchronous Transfer Mode) and Sonet (Synchronous Optical Network). The
difference between the WAN technologies is based on the switching capabilities they
perform and the speed at which sending and receiving bits of information (data)
occur.
For more information on WANs, see Frame Relay, ATM and Sonet.
Wireless networks (WLAN, WWAN)
A wireless network is basically the same as a LAN or a WAN but there are no wires
between hosts and servers. The data is transferred over sets of radio transceivers.
These types of networks are beneficial when it is too costly or inconvenient to run
the necessary cables. For more information, see Wireless LAN and Wireless wide area
network. The media access protocols for LANs come from the IEEE.
The most common IEEE 802.11 WLANs cover, depending on antennas, ranges from
hundreds of meters to low kilometers. For larger areas, either communications
satellites of various types, cellular radio, or wireless local loop (IEEE 802.16) all have
50
advantages and disadvantages. Depending on the type of mobility needed, the
relevant standards may come from the IETF or the ITU.
Network topology
The network topology defines the way in which computers, printers, and other
devices are connected, physically and logically. A network topology describes the
layout of the wire and devices as well as the paths used by data transmissions.
Commonly used topologies include:
Bus
Star
Tree (hierarchical)
Linear
Ring
Mesh
o partially connected
o fully connected (sometimes known as fully redundant)
The network topologies mentioned above are only a general representation of the
kinds of topologies used in computer network and are considered basic topologies.
51
Internet
Visualization of the various routes through a portion of the Internet.
The Internet is a worldwide, publicly accessible series of interconnected computer
networks that transmit data by packet switching using the standard Internet Protocol
(IP). It is a "network of networks" that consists of millions of smaller domestic,
academic, business, and government networks, which together carry various
information and services, such as electronic mail, online chat, file transfer, and the
interlinked web pages and other resources of the World Wide Web (WWW).
Terminology
The Internet protocol suite is a collection of standards and protocols organized into
layers so that each layer provides the foundation and the services required by the
layer above. In this scheme, the Internet consists of the computers and networks
that handle Internet Protocol (IP) data packets. Transmission Control Protocol (TCP)
depends on IP and solves problems like data packets arriving out of order or not at
all. Next comes Hypertext Transfer Protocol (HTTP), which is an application layer
protocol. It runs on top of TCP/IP and provides user agents, such as web browsers,
52
with access to the files, documents and other resources of the World Wide Web
(WWW). More generally, the Internet is the world-wide network holding most data
communications together, while the World-Wide Web is just one of many applications
that the Internet can be used for.
History
Growth
Although the basic applications and guidelines that make the Internet possible had
existed for almost a decade, the network did not gain a public face until the 1990s.
On August 6, 1991, CERN, which straddles the border between France and
Switzerland, publicized the new World Wide Web project, two years after British
scientist Tim Berners-Lee had begun creating HTML, HTTP and the first few Web
pages at CERN.
An early popular web browser was ViolaWWW based upon HyperCard. It was
eventually replaced in popularity by the Mosaic web browser. In 1993, the National
Center for Supercomputing Applications at the University of Illinois released version
1.0 of Mosaic, and by late 1994 there was growing public interest in the previously
academic/technical Internet. By 1996 usage of the word "Internet" had become
commonplace, and consequently, so had its misuse as a reference to the World Wide
Web.
Meanwhile, over the course of the decade, the Internet successfully accommodated
the majority of previously existing public computer networks (although some
networks, such as FidoNet, have remained separate). During the 1990s, it was
estimated that the Internet grew by 100% per year, with a brief period of explosive
growth in 1996 and 1997. This growth is often attributed to the lack of central
administration, which allows organic growth of the network, as well as the non-
proprietary open nature of the Internet protocols, which encourages vendor
interoperability and prevents any one company from exerting too much control over
the network.
53
University Students Appreciation and Contributions
New findings in the field of communications during the 1960’s and 1970’s were
quickly adopted by universities across the United States. Their openness for
technology and new ideas saw many of them amongst the first to appreciate this
new Cultural Revolution – in most cases seeking technological innovation for the
pure joy of discovery, and seeing the potential for a tool of liberation.
Examples of early university Internet communities are Cleveland FreeNet, Blacksburg
Electronic Village and Nova Scotia. Students took up the opportunity of free
communications and saw this new phenomenon as a tool of liberation. Personal
computers and the Internet would free them from corporations and governments
(Nelson, Jennings, Stallman).
‘The culture of individual freedom sprouting in the university campuses of the 1960’s
and 1970’s used computer networking to it’s own ends’ (Castells 2001)
The students appreciated it through a cultural revolutionary way of thinking, similar
to that of Ted Nelson or Douglas Engelbart. Students agreed with the ideas of free
software and cooperative use of resources, which was always the early hacker
conduct. (Castells 2001)
Graduate students also played a huge part in the creation of ARPANET. In the
1960’s, the network working group, which did most of the design for ARPANET’s
protocols was composed mainly of graduate students.
54
Today's Internet
The Opera Community rack. From the top, user file storage (content of
files.myopera.com), "bigma" (the master MySQL database server), and two IBM
blade centers containing multi-purpose machines (Apache front ends, Apache back
ends, slave MySQL database servers, load balancers, file servers, cache servers and
sync masters.
Aside from the complex physical connections that make up its infrastructure, the
Internet is facilitated by bi- or multi-lateral commercial contracts (e.g., peering
agreements), and by technical specifications or protocols that describe how to
exchange data over the network. Indeed, the Internet is essentially defined by its
interconnections and routing policies.
As of September 30, 2007, 1.244 billion people use the Internet according to
Internet World Stats. Writing in the Harvard International Review, philosopher
N.J.Slabbert, a writer on policy issues for the Washington DC-based Urban Land
Institute, has asserted that the Internet is fast becoming a basic feature of global
civilization, so that what has traditionally been called "civil society" is now becoming
identical with information technology society as defined by Internet use. Only 2% of
the World's population regularly accesses the internet.
55
Internet protocols
In this context, there are three layers of protocols:
At the lower level (OSI layer 3) is IP (Internet Protocol), which defines the
datagrams or packets that carry blocks of data from one node to another. The
vast majority of today's Internet uses version four of the IP protocol (i.e.
IPv4), and although IPv6 is standardized, it exists only as "islands" of
connectivity, and there are many ISPs without any IPv6 connectivity. ICMP
(Internet Control Message Protocol) also exists at this level. ICMP is
connectionless; it is used for control, signaling, and error reporting purposes.
TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) exist
at the next layer up (OSI layer 4); these are the protocols by which data is
transmitted. TCP makes a virtual 'connection', which gives some level of
guarantee of reliability. UDP is a best-effort, connectionless transport, in
which data packets that are lost in transit will not be re-sent.
The application protocols sit on top of TCP and UDP and occupy layers 5, 6,
and 7 of the OSI model. These define the specific messages and data formats
sent and understood by the applications running at each end of the
communication. Examples of these protocols are HTTP, FTP, and SMTP.
Internet structure
There have been many analyses of the Internet and its structure. For example, it has
been determined that the Internet IP routing structure and hypertext links of the
World Wide Web are examples of scale-free networks.
Similar to the way the commercial Internet providers connect via Internet exchange
points, research networks tend to interconnect into large subnetworks such as:
GEANT
GLORIAD
The Internet2 Network (formally known as the Abilene Network)
JANET (the UK's national research and education network)
56
These in turn are built around relatively smaller networks. See also the list of
academic computer network organizations.
In network diagrams, the Internet is often represented by a cloud symbol, into and
out of which network communications can pass.
ICANN
ICANN headquarters in Marina Del Rey, California, United States
.
The Internet Corporation for Assigned Names and Numbers (ICANN) is the authority
that coordinates the assignment of unique identifiers on the Internet, including
domain names, Internet Protocol (IP) addresses, and protocol port and parameter
numbers. A globally unified namespace (i.e., a system of names in which there is at
most one holder for each possible name) is essential for the Internet to function.
ICANN is headquartered in Marina del Rey, California, but is overseen by an
international board of directors drawn from across the Internet technical, business,
academic, and non-commercial communities. The US government continues to have
the primary role in approving changes to the root zone file that lies at the heart of
the domain name system. Because the Internet is a distributed network comprising
many voluntarily interconnected networks, the Internet, as such, has no governing
body. ICANN's role in coordinating the assignment of unique identifiers distinguishes
it as perhaps the only central coordinating body on the global Internet, but the scope
of its authority extends only to the Internet's systems of domain names, IP
addresses, protocol ports and parameter numbers.
On November 16, 2005, the World Summit on the Information Society, held in Tunis,
established the Internet Governance Forum (IGF) to discuss Internet-related issues.
57
Language
The prevalent language for communication on the Internet is English. This may be a
result of the Internet's origins, as well as English's role as the lingua franca. It may
also be related to the poor capability of early computers, largely originating in the
United States, to handle characters other than those in the English variant of the
Latin alphabet.
After English (31% of Web visitors) the most-requested languages on the World
Wide Web are Chinese 16%, Spanish 9%, Japanese 7%, German 5% and French 5%
(from Internet World Stats, updated June 30, 2007).
By continent, 37% of the world's Internet users are based in Asia, 27% in Europe,
19% in North America, and 9% in Latin America and the Carribean ([2] updated
September 30, 2007).
The Internet's technologies have developed enough in recent years, especially in the
use of Unicode, that good facilities are available for development and communication
in most widely used languages. However, some glitches such as mojibake (incorrect
display of foreign language characters, also known as kryakozyabry) still remain.
Internet and the workplace
The Internet is allowing greater flexibility in working hours and location, especially
with the spread of unmetered high-speed connections and Web applications.
The Internet viewed on mobile devices
The Internet can now be accessed virtually anywhere by numerous means. Mobile
phones, datacards, handheld game consoles and cellular routers allow users to
connect to the Internet from anywhere there is a cellular network supporting that
device's technology.
58
Common uses of the Internet
The concept of sending electronic text messages between parties in a way analogous
to mailing letters or memos predates the creation of the Internet. Even today it can
be important to distinguish between Internet and internal e-mail systems. Internet
e-mail may travel and be stored unencrypted on many other networks and machines
out of both the sender's and the recipient's control. During this time it is quite
possible for the content to be read and even tampered with by third parties, if
anyone considers it important enough. Purely internal or intranet mail systems,
where the information never leaves the corporate or organization's network, are
much more secure, although in any organization there will be IT and other personnel
whose job may involve monitoring, and occasionally accessing, the email of other
employees not addressed to them.
The World Wide Web
Graphic representation of less than 0.0001% of the WWW, representing some of the
hyperlinks
Many people use the terms Internet and World Wide Web (or just the Web)
interchangeably, but, as discussed above, the two terms are not synonymous.
The World Wide Web is a huge set of interlinked documents, images and other
resources, linked by hyperlinks and URLs. These hyperlinks and URLs allow the web-
servers and other machines that store originals, and cached copies, of these
resources to deliver them as required using HTTP (Hypertext Transfer Protocol).
HTTP is only one of the communication protocols used on the Internet.
Web services also use HTTP to allow software systems to communicate in order to
share and exchange business logic and data.
Software products that can access the resources of the Web are correctly termed
user agents. In normal use, web browsers, such as Internet Explorer and Firefox
access web pages and allow users to navigate from one to another via hyperlinks.
Web documents may contain almost any combination of computer data including
59
photographs, graphics, sounds, text, video, multimedia and interactive content
including games, office applications and scientific demonstrations.
Through keyword-driven Internet research using search engines, like Yahoo!, and
Google, millions of people worldwide have easy, instant access to a vast and diverse
amount of online information. Compared to encyclopedias and traditional libraries,
the World Wide Web has enabled a sudden and extreme decentralization of
information and data.
It is also easier, using the Web, than ever before for individuals and organisations to
publish ideas and information to an extremely large audience. Anyone can find ways
to publish a web page or build a website for very little initial cost. Publishing and
maintaining large, professional websites full of attractive, diverse and up-to-date
information is still a difficult and expensive proposition, however.
Many individuals and some companies and groups use "web logs" or blogs, which are
largely used as easily-updatable online diaries. Some commercial organizations
encourage staff to fill them with advice on their areas of specialization in the hope
that visitors will be impressed by the expert knowledge and free information, and be
attracted to the corporation as a result. One example of this practice is Microsoft,
whose product developers publish their personal blogs in order to pique the public's
interest in their work.
Collections of personal web pages published by large service providers remain
popular, and have become increasingly sophisticated. Whereas operations such as
Angelfire and GeoCities have existed since the early days of the Web, newer
offerings from, for example, Facebook and MySpace currently have large followings.
These operations often brand themselves as social network services rather than
simply as web page hosts.
Advertising on popular web pages can be lucrative, and e-commerce or the sale of
products and services directly via the Web continues to grow.
In the early days, web pages were usually created as sets of complete and isolated
HTML text files stored on a web server. More recently, web sites are more often
created using content management system (CMS) or wiki software with, initially,
very little content. Users of these systems, who may be paid staff, members of a
60
club or other organization or members of the public, fill the underlying databases
with content using editing pages designed for that purpose, while casual visitors view
and read this content in its final HTML form. There may or may not be editorial,
approval and security systems built into the process of taking newly entered content
and making it available to the target visitors.
Remote access
The Internet allows computer users to connect to other computers and information
stores easily, wherever they may be across the world. They may do this with or
without the use of security, authentication and encryption technologies, depending
on the requirements.
This is encouraging new ways of working from home, collaboration and information
sharing in many industries. An accountant sitting at home can audit the books of a
company based in another country, on a server situated in a third country that is
remotely maintained by IT specialists in a fourth. These accounts could have been
created by home-working book-keepers, in other remote locations, based on
information e-mailed to them from offices all over the world. Some of these things
were possible before the widespread use of the Internet, but the cost of private,
leased lines would have made many of them infeasible in practice.
An office worker away from his desk, perhaps the other side of the world on a
business trip or a holiday, can open a remote desktop session into their normal office
PC using a secure Virtual Private Network (VPN) connection via the Internet. This
gives the worker complete access to all of their normal files and data, including e-
mail and other applications, while away from the office.
This concept is also referred to by some network security people as the Virtual
Private Nightmare, because it extends the secure perimeter of a corporate network
into its employees' homes; this has been the source of some notable security
breaches, but also provides security for the workers.
Collaboration
The low cost and nearly instantaneous sharing of ideas, knowledge, and skills has
made collaborative work dramatically easier. Not only can a group cheaply
61
communicate and test, but the wide reach of the Internet allows such groups to
easily form in the first place, even among niche interests. An example of this is the
free software movement in software development which produced GNU and Linux
from scratch and has taken over development of Mozilla and OpenOffice.org
(formerly known as Netscape Communicator and StarOffice). Films such as Zeitgeist,
Loose Change and Endgame have had extensive coverage on the internet, while
being virtually ignored in the mainstream media.
Internet 'chat', whether in the form of IRC 'chat rooms' or channels, or via instant
messaging systems allow colleagues to stay in touch in a very convenient way when
working at their computers during the day. Messages can be sent and viewed even
more quickly and conveniently than via e-mail. Extension to these systems may
allow files to be exchanged, 'whiteboard' drawings to be shared as well as voice and
video contact between team members.
Version control systems allow collaborating teams to work on shared sets of
documents without either accidentally overwriting each other's work or having
members wait until they get 'sent' documents to be able to add their thoughts and
changes.
File sharing
A computer file can be e-mailed to customers, colleagues and friends as an
attachment. It can be uploaded to a Web site or FTP server for easy download by
others. It can be put into a "shared location" or onto a file server for instant use by
colleagues. The load of bulk downloads to many users can be eased by the use of
"mirror" servers or peer-to-peer networks.
In any of these cases, access to the file may be controlled by user authentication;
the transit of the file over the Internet may be obscured by encryption and money
may change hands before or after access to the file is given. The price can be paid
by the remote charging of funds from, for example a credit card whose details are
also passed—hopefully fully encrypted—across the Internet. The origin and
authenticity of the file received may be checked by digital signatures or by MD5 or
other message digests.
62
These simple features of the Internet, over a world-wide basis, are changing the
basis for the production, sale, and distribution of anything that can be reduced to a
computer file for transmission. This includes all manner of print publications,
software products, news, music, film, video, photography, graphics and the other
arts. This in turn has caused seismic shifts in each of the existing industries that
previously controlled the production and distribution of these products.
Internet collaboration technology enables business and project teams to share
documents, calendars and other information. Such collaboration occurs in a wide
variety of areas including scientific research, software development, conference
planning, political activism and creative writing.
Streaming media
Many existing radio and television broadcasters provide Internet 'feeds' of their live
audio and video streams (for example, the BBC). They may also allow time-shift
viewing or listening such as Preview, Classic Clips and Listen Again features. These
providers have been joined by a range of pure Internet 'broadcasters' who never had
on-air licenses. This means that an Internet-connected device, such as a computer
or something more specific, can be used to access on-line media in much the same
way as was previously possible only with a television or radio receiver. The range of
material is much wider, from pornography to highly specialized technical Web-casts.
Podcasting is a variation on this theme, where—usually audio—material is first
downloaded in full and then may be played back on a computer or shifted to a digital
audio player to be listened to on the move. These techniques using simple
equipment allow anybody, with little censorship or licensing control, to broadcast
audio-visual material on a worldwide basis.
Webcams can be seen as an even lower-budget extension of this phenomenon. While
some webcams can give full frame rate video, the picture is usually either small or
updates slowly. Internet users can watch animals around an African waterhole, ships
in the Panama Canal, the traffic at a local roundabout or their own premises, live and
in real time. Video chat rooms, video conferencing, and remote controllable webcams
are also popular. Many uses can be found for personal webcams in and around the
home, with and without two-way sound.
63
Voice telephony (VoIP)
VoIP stands for Voice over IP, where IP refers to the Internet Protocol that underlies
all Internet communication. This phenomenon began as an optional two-way voice
extension to some of the Instant Messaging systems that took off around the year
2000. In recent years many VoIP systems have become as easy to use and as
convenient as a normal telephone. The benefit is that, as the Internet carries the
actual voice traffic, VoIP can be free or cost much less than a normal telephone call,
especially over long distances and especially for those with always-on Internet
connections such as cable or ADSL.
Thus VoIP is maturing into a viable alternative to traditional telephones.
Interoperability between different providers has improved and the ability to call or
receive a call from a traditional telephone is available. Simple inexpensive VoIP
modems are now available that eliminate the need for a PC.
Voice quality can still vary from call to call but is often equal to and can even exceed
that of traditional calls.
Remaining problems for VoIP include emergency telephone number dialing and
reliability. Currently a few VoIP providers provide an emergency service but it is not
universally available. Traditional phones are line powered and operate during a
power failure, VoIP does not do so without a backup power source for the
electronics.
Most VoIP providers offer unlimited national calling but the direction in VoIP is clearly
toward global coverage with unlimited minutes for a low monthly fee.
VoIP has also become increasingly popular within the gaming world, as a form of
communication between players. Popular gaming VoIP clients include Ventrilo and
Teamspeak, and there are others available also. The PlayStation 3 also features a
VoIP chat feature.
64
Internet access
Common methods of home access include dial-up, landline broadband (over coaxial
cable, fiber optic or copper wires), Wi-Fi, satellite and 3G technology cell phones.
Public places to use the Internet include libraries and Internet cafes, where
computers with Internet connections are available. There are also Internet access
points in many public places such as airport halls and coffee shops, in some cases
just for brief use while standing. Various terms are used, such as "public Internet
kiosk", "public access terminal", and "Web payphone". Many hotels now also have
public terminals, though these are usually fee-based. These terminals are widely
accessed for various usage like ticket booking, bank deposit, online payment etc. Wi-
Fi provides wireless access to computer networks, and therefore can do so to the
Internet itself. Hotspots providing such access include Wi-Fi-cafes, where a would-be
user needs to bring their own wireless-enabled devices such as a laptop or PDA.
These services may be free to all, free to customers only, or fee-based. A hotspot
need not be limited to a confined location. The whole campus or park, or even the
entire city can be enabled. Grassroots efforts have led to wireless community
networks. Commercial WiFi services covering large city areas are in place in London,
Vienna, Toronto, San Francisco, Philadelphia, Chicago and Pittsburgh. The Internet
can then be accessed from such places as a park bench.
Apart from Wi-Fi, there have been experiments with proprietary mobile wireless
networks like Ricochet, various high-speed data services over cellular phone
networks, and fixed wireless services.
High-end mobile phones such as smartphones generally come with Internet access
through the phone network. Web browsers such as Opera are available on these
advanced handsets, which can also run a wide variety of other Internet software.
More mobile phones have Internet access than PCs, though this is not as widely
used. An Internet access provider and protocol matrix differentiates the methods
used to get online.
65
Leisure
The Internet has been a major source of leisure since before the World Wide Web,
with entertaining social experiments such as MUDs and MOOs being conducted on
university servers, and humor-related Usenet groups receiving much of the main
traffic. Today, many Internet forums have sections devoted to games and funny
videos; short cartoons in the form of Flash movies are also popular. Over 6 million
people use blogs or message boards as a means of communication and for the
sharing of ideas.
The pornography and gambling industries have both taken full advantage of the
World Wide Web, and often provide a significant source of advertising revenue for
other Web sites. Although many governments have attempted to put restrictions on
both industries' use of the Internet, this has generally failed to stop their widespread
popularity.
One main area of leisure on the Internet is multiplayer gaming. This form of leisure
creates communities, bringing people of all ages and origins to enjoy the fast-paced
world of multiplayer games. These range from MMORPG to first-person shooters,
from role-playing games to online gambling. This has revolutionized the way many
people interact and spend their free time on the Internet.
While online gaming has been around since the 1970s, modern modes of online
gaming began with services such as GameSpy and MPlayer, which players of games
would typically subscribe to. Non-subscribers were limited to certain types of
gameplay or certain games.
Many use the Internet to access and download music, movies and other works for
their enjoyment and relaxation. As discussed above, there are paid and unpaid
sources for all of these, using centralized servers and distributed peer-to-peer
technologies. Discretion is needed as some of these sources take more care over the
original artists' rights and over copyright laws than others.
Many use the World Wide Web to access news, weather and sports reports, to plan
and book holidays and to find out more about their random ideas and casual
interests.
66
People use chat, messaging and email to make and stay in touch with friends
worldwide, sometimes in the same way as some previously had pen pals. Social
networking Web sites like Myspace and Facebook many others like them also put and
keep people in contact for their enjoyment.
The Internet has seen a growing number of Internet operating systems, where users
can access their files, folders, and settings via the Internet. An example of an
opensource webOS is Eyeos.
Cyberslacking has become a serious drain on corporate resources; the average UK
employee spends 57 minutes a day surfing the Web at work, according to a study by
Peninsula Business Services.
A survey performed by JWT advertising agency showed that most Americans say
they cannot live without use of the Internet for more than a week. 1,011 Americans
participated in the survey answering such questions as how long they can do without
Internet. As much as 15 percent mentioned that they would be able to live without
the Internet for not more than a day or less, 21% mentioned they could be Internet-
free no more than a couple of days, 19% said mentioned a few days. Only 1/3 of
respondents mentioned they will be able to live without Internet for a week.
Complex architecture
Many computer scientists see the Internet as a "prime example of a large-scale,
highly engineered, yet highly complex system". The Internet is extremely
heterogeneous. (For instance, data transfer rates and physical characteristics of
connections vary widely.) The Internet exhibits "emergent phenomena" that depend
on its large-scale organization. For example, data transfer rates exhibit temporal
self-similarity. Further adding to the complexity of the Internet is the ability of more
than one computer to use the Internet through only one node, thus creating the
possibility for a very deep and hierarchal based sub-network that can theoretically be
extended infinitely (disregarding the programmatic limitations of the IPv4 protocol).
However, since principles of this architecture date back to the 1960s, it might not be
a solution best suited to modern needs, and thus the possibility of developing
alternative structures is currently being looked into.
67
According to a June 2007 article in Discover Magazine, the combined weight of all the
electrons moved within the the internet in a day is 0.2 millionths of an ounce. Others
have estimated this at nearer 2 ounces (50 grams).
Marketing
The Internet has also become a large market for companies; some of the biggest
companies today have grown by taking advantage of the efficient nature of low-cost
advertising and commerce through the Internet, also known as e-commerce. It is the
fastest way to spread information to a vast number of people simultaneously. The
Internet has also subsequently revolutionized shopping—for example; a person can
order a CD online and receive it in the mail within a couple of days, or download it
directly in some cases. The Internet has also greatly facilitated personalized
marketing which allows a company to market a product to a specific person or a
specific group of people more so than any other advertising medium.
Examples of personalized marketing include online communities such as MySpace,
Friendster, Orkut, Facebook and others which thousands of Internet users join to
advertise themselves and make friends online. Many of these users are young teens
and adolescents ranging from 13 to 25 years old. In turn, when they advertise
themselves they advertise interests and hobbies, which online marketing companies
can use as information as to what those users will purchase online, and advertise
their own companies' products to those users.
The name Internet
Internet is traditionally written with a capital first letter, as it is a proper noun. The
Internet Society, the Internet Engineering Task Force, the Internet Corporation for
Assigned Names and Numbers, the World Wide Web Consortium, and several other
Internet-related organizations use this convention in their publications.
Many newspapers, newswires, periodicals, and technical journals capitalize the term
(Internet). Examples include The New York Times, the Associated Press, Time, The
Times of India, Hindustan Times, and Communications of the ACM.
Others assert that the first letter should be in lower case (internet), and that the
specific article “the” is sufficient to distinguish “the internet” from other internets. A
68
significant number of publications use this form, including The Economist, the
Canadian Broadcasting Corporation, the Financial Times, The Guardian, The Times,
and The Sydney Morning Herald. As of 2005, many publications using internet
appear to be located outside of North America—although one U.S. news source,
Wired News, has adopted the lower-case spelling.
Historically, Internet and internet have had different meanings, with internet
meaning “an interconnected set of distinct networks,” and Internet referring to the
world-wide, publicly-available IP internet. Under this distinction, "the Internet" is the
familiar network via which websites exist, however "an internet" can exist between
any two remote locations. Any group of distinct networks connected together is an
internet; each of these networks may or may not be part of the Internet. The
distinction was evident in many RFCs, books, and articles from the 1980s and early
1990s (some of which, such as RFC 1918, refer to "internets" in the plural), but has
recently fallen into disuse. Instead, the term intranet is generally used for private
networks, whether they are connected to the Internet or not. See also: extranet.
Some people use the lower-case term as a medium (like radio or newspaper, e.g.
I've found it on the internet), and first letter capitalized as the global network.
69
Programming language
A programming language is an artificial language that can be used to control the
behavior of a machine, particularly a computer. Programming languages, like natural
languages, are defined by syntactic and semantic rules which describe their structure
and meaning respectively. Many programming languages have some form of written
specification of their syntax and semantics; some are defined only by an official
implementation.
Programming languages are used to facilitate communication about the task of
organizing and manipulating information, and to express algorithms precisely. Some
authors restrict the term "programming language" to those languages that can
express all possible algorithms; sometimes the term "computer language" is used for
more limited artificial languages.
Thousands of different programming languages have been created, and new
languages are created every year.
Definitions
Traits often considered important for constituting a programming language:
Function: A programming language is a language used to write computer
programs, which involve a computer performing some kind of computation or
algorithm and possibly control external devices such as printers, robots, and
so on.
Target: Programming languages differ from natural languages in that natural
languages are only used for interaction between people, while programming
languages also allow humans to communicate instructions to machines. Some
programming languages are used by one device to control another. For
example PostScript programs are frequently created by another program to
control a computer printer or display.
70
Constructs: Programming languages may contain constructs for defining and
manipulating data structures or controlling the flow of execution.
Expressive power: The theory of computation classifies languages by the
computations they can express (see Chomsky hierarchy). All Turing complete
languages can implement the same set of algorithms. ANSI/ISO SQL and
Charity are examples of languages that are not Turing complete yet often
called programming languages.
Non-computational languages, such as markup languages like HTML or formal
grammars like BNF, are usually not considered programming languages. Often a
programming language is embedded in the non-computational (host) language.
Purpose
A prominent purpose of programming languages is to provide instructions to a
computer. As such, programming languages differ from most other forms of human
expression in that they require a greater degree of precision and completeness.
When using a natural language to communicate with other people, human authors
and speakers can be ambiguous and make small errors, and still expect their intent
to be understood. However, computers do exactly what they are told to do, and
cannot understand the code the programmer "intended" to write. The combination of
the language definition, the program, and the program's inputs must fully specify the
external behavior that occurs when the program is executed.
Many languages have been designed from scratch, altered to meet new needs,
combined with other languages, and eventually fallen into disuse. Although there
have been attempts to design one "universal" computer language that serves all
purposes, all of them have failed to be accepted in this role. The need for diverse
computer languages arises from the diversity of contexts in which languages are
used:
Programs range from tiny scripts written by individual hobbyists to huge
systems written by hundreds of programmers.
Programmers range in expertise from novices who need simplicity above all
else, to experts who may be comfortable with considerable complexity.
71
Programs must balance speed, size, and simplicity on systems ranging from
microcontrollers to supercomputers.
Programs may be written once and not change for generations, or they may
undergo nearly constant modification.
Finally, programmers may simply differ in their tastes: they may be
accustomed to discussing problems and expressing them in a particular
language.
One common trend in the development of programming languages has been to add
more ability to solve problems using a higher level of abstraction. The earliest
programming languages were tied very closely to the underlying hardware of the
computer. As new programming languages have developed, features have been
added that let programmers express ideas that are more removed from simple
translation into underlying hardware instructions. Because programmers are less tied
to the needs of the computer, their programs can do more computing with less effort
from the programmer. This lets them write more programs in the same amount of
time.
Natural language processors have been proposed as a way to eliminate the need for
a specialized language for programming. However, this goal remains distant and its
benefits are open to debate. Edsger Dijkstra took the position that the use of a
formal language is essential to prevent the introduction of meaningless constructs,
and dismissed natural language programming as "foolish." Alan Perlis was similarly
dismissive of the idea.
72
Elements
Syntax
Parse tree of Python code with inset tokenization
Parse tree of Python code with inset tokenization
Syntax highlighting is often used to aid programmers in the recognition of elements of source code. The
language you see here is Python
73
A programming language's surface form is known as its syntax. Most programming
languages are purely textual; they use sequences of text including words, numbers,
and punctuation, much like written natural languages. On the other hand, there are
some programming languages which are more graphical in nature, using spatial
relationships between symbols to specify a program.
The syntax of a language describes the possible combinations of symbols that form a
syntactically correct program. The meaning given to a combination of symbols is
handled by semantics. Since most languages are textual, this article discusses
textual syntax.
Programming language syntax is usually defined using a combination of regular
expressions (for lexical structure) and Backus-Naur Form (for grammatical
structure). Below is a simple grammar, based on Lisp:
expression ::= atom | list
atom ::= number | symbol
number ::= [+-]?['0'-'9']+
symbol ::= ['A'-'Z' 'a'-'z'].*
list ::= '(' expression* ')'
This grammar specifies the following:
an expression is either an atom or a list;
an atom is either a number or a symbol;
a number is an unbroken sequence of one or more decimal digits, optionally
preceded by a plus or minus sign;
a symbol is a letter followed by zero or more of any characters (excluding
whitespace); and
a list is a matched pair of parentheses, with zero or more expressions inside
it.
The following are examples of well-formed token sequences in this grammar:
'12345', '()', '(a b c232 (1))'
Not all syntactically correct programs are semantically correct. Many syntactically
correct programs are nonetheless ill-formed, per the language's rules; and may
74
(depending on the language specification and the soundness of the implementation)
result in an error on translation or execution. In some cases, such programs may
exhibit undefined behavior. Even when a program is well-defined within a language,
it may still have a meaning that is not intended by the person who wrote it.
Using natural language as an example, it may not be possible to assign a meaning to
a grammatically correct sentence or the sentence may be false:
"Colorless green ideas sleep furiously." is grammatically well-formed but has
no generally accepted meaning.
"John is a married bachelor." is grammatically well-formed but expresses a
meaning that cannot be true.
The following C language fragment is syntactically correct, but performs an operation
that is not semantically defined (because p is a null pointer, the operations p->real
and p->im have no meaning):
complex *p = NULL;
complex abs_p = sqrt (p->real * p->real + p->im * p->im);
The grammar needed to specify a programming language can be classified by its
position in the Chomsky hierarchy. The syntax of most programming languages can
be specified using a Type-2 grammar, i.e., they are context-free grammars.
Type system
For more details on this topic, see Type system.
For more details on this topic, see Type safety.
A type system defines how a programming language classifies values and
expressions into types, how it can manipulate those types and how they interact.
This generally includes a description of the data structures that can be constructed in
the language. The design and study of type systems using formal mathematics is
known as type theory.
Internally, all data in modern digital computers are stored simply as zeros or ones
(binary).
75
Typed versus untyped languages
A language is typed if operations defined for one data type cannot be performed on
values of another data type. For example, "this text between the quotes" is a string.
In most programming languages, dividing a number by a string has no meaning.
Most modern programming languages will therefore reject any program attempting
to perform such an operation. In some languages, the meaningless operation will be
detected when the program is compiled ("static" type checking), and rejected by the
compiler, while in others, it will be detected when the program is run ("dynamic"
type checking), resulting in a runtime exception.
A special case of typed languages are the single-type languages. These are often
scripting or markup languages, such as Rexx or SGML, and have only one data type
— most commonly character strings which are used for both symbolic and numeric
data.
In contrast, an untyped language, such as most assembly languages, allows any
operation to be performed on any data, which are generally considered to be
sequences of bits of various lengths. High-level languages which are untyped include
BCPL and some varieties of Forth.
In practice, while few languages are considered typed from the point of view of type
theory (verifying or rejecting all operations), most modern languages offer a degree
of typing. Many production languages provide means to bypass or subvert the type
system.
Static versus dynamic typing
In static typing all expressions have their types determined prior to the program
being run (typically at compile-time). For example, 1 and (2+2) are integer
expressions; they cannot be passed to a function that expects a string, or stored in a
variable that is defined to hold dates.
Statically-typed languages can be manifestly typed or type-inferred. In the first case,
the programmer must explicitly write types at certain textual positions (for example,
at variable declarations). In the second case, the compiler infers the types of
expressions and declarations based on context. Most mainstream statically-typed
76
languages, such as C++ and Java, are manifestly typed. Complete type inference
has traditionally been associated with less mainstream languages, such as Haskell
and ML. However, many manifestly typed languages support partial type inference;
for example, Java and C# both infer types in certain limited cases. Dynamic typing,
also called latent typing, determines the type-safety of operations at runtime; in
other words, types are associated with runtime values rather than textual
expressions. As with type-inferred languages, dynamically typed languages do not
require the programmer to write explicit type annotations on expressions. Among
other things, this may permit a single variable to refer to values of different types at
different points in the program execution. However, type errors cannot be
automatically detected until a piece of code is actually executed, making debugging
more difficult. Ruby, Lisp, JavaScript, and Python are dynamically typed.
Weak and strong typing
Weak typing allows a value of one type to be treated as another, for example
treating a string as a number. This can occasionally be useful, but it can also allow
some kinds of program faults to go undetected at compile time.
Strong typing prevents the above. Attempting to mix types raises an error. Strongly-
typed languages are often termed type-safe or safe. Type safety can prevent
particular kinds of program faults occurring (because constructs containing them are
flagged at compile time).
An alternative definition for "weakly typed" refers to languages, such as Perl,
JavaScript, and C++ which permit a large number of implicit type conversions; Perl
in particular can be characterized as a dynamically typed programming language in
which type checking can take place at runtime. See type system. This capability is
often useful, but occasionally dangerous; as it would permit operations whose
objects can change type on demand.
Strong and static are generally considered orthogonal concepts, but usage in the
literature differs. Some use the term strongly typed to mean strongly, statically
typed, or, even more confusingly, to mean simply statically typed. Thus C has been
called both strongly typed and weakly, statically typed.
77
Execution semantics
Once data has been specified, the machine must be instructed to perform operations
on the data. The execution semantics of a language defines how and when the
various constructs of a language should produce a program behavior.
For example, the semantics may define the strategy by which expressions are
evaluated to values, or the manner in which control structures conditionally execute
statements.
Core library
Most programming languages have an associated core library (sometimes known as
the 'Standard library', especially if it is included as part of the published language
standard), which is conventionally made available by all implementations of the
language. Core libraries typically include definitions for commonly used algorithms,
data structures, and mechanisms for input and output.
A language's core library is often treated as part of the language by its users,
although the designers may have treated it as a separate entity. Many language
specifications define a core that must be made available in all implementations, and
in the case of standardized languages this core library may be required. The line
between a language and its core library therefore differs from language to language.
Indeed, some languages are designed so that the meanings of certain syntactic
constructs cannot even be described without referring to the core library. For
example, in Java, a string literal is defined as an instance of the java.lang.String
class; similarly, in Smalltalk, an anonymous function expression (a "block")
constructs an instance of the library's BlockContext class. Conversely, Scheme
contains multiple coherent subsets that suffice to construct the rest of the language
as library macros, and so the language designers do not even bother to say which
portions of the language must be implemented as language constructs, and which
must be implemented as parts of a library.
78
Practice
A language's designers and users must construct a number of artifacts that govern
and enable the practice of programming. The most important of these artifacts are
the language specification and implementation.
Specification
The specification of a programming language is intended to provide a definition
that language users and implementers can use to determine the behavior of a
program, given its source code.
A programming language specification can take several forms, including the
following:
An explicit definition of the syntax and semantics of the language. While
syntax is commonly specified using a formal grammar, semantic definitions
may be written in natural language (e.g., the C language), or a formal
semantics (e.g., the Standard MLand Scheme specifications).
A description of the behavior of a translator for the language (e.g., the C++
and Fortran specifications). The syntax and semantics of the language have to
be inferred from this description, which may be written in natural or a formal
language.
A reference or model implementation, sometimes written in the language
being specified (e.g., Prolog or ANSI REXX). The syntax and semantics of the
language are explicit in the behavior of the reference implementation.
Implementation
An implementation of a programming language provides a way to execute that
program on one or more configurations of hardware and software. There are,
broadly, two approaches to programming language implementation: compilation and
interpretation. It is generally possible to implement a language using either
technique.
The output of a compiler may be executed by hardware or a program called an
interpreter. In some implementations that make use of the interpreter approach
79
there is no distinct boundary between compiling and interpreting. For instance, some
implementations of the BASIC programming language compile and then execute the
source a line at a time.
Programs that are executed directly on the hardware usually run several orders of
magnitude faster than those that are interpreted in software.
One technique for improving the performance of interpreted programs is just-in-time
compilation. Here the virtual machine monitors which sequences of bytecode are
frequently executed and translates them to machine code for direct execution on the
hardware.
History
A selection of textbooks that teach programming, in languages both popular and
obscure. These are only a few of the thousands of programming languages and
dialects that have been designed in history.
Early developments
The first programming languages predate the modern computer. The 19th century
had "programmable" looms and player piano scrolls which implemented what are
today recognized as examples of domain-specific programming languages. By the
beginning of the twentieth century, punch cards encoded data and directed
mechanical processing. In the 1930s and 1940s, the formalisms of Alonzo Church's
lambda calculus and Alan Turing's Turing machines provided mathematical
abstractions for expressing algorithms; the lambda calculus remains influential in
language design.
80
In the 1940s, the first electrically powered digital computers were created. The
computers of the early 1950s, notably the UNIVAC I and the IBM 701 used machine
language programs. First generation machine language programming was quickly
superseded by a second generation of programming languages known as Assembly
languages. Later in the 1950s, assembly language programming, which had evolved
to include the use of macro instructions, was followed by the development of three
higher-level programming languages: FORTRAN, LISP, and COBOL. Updated versions
of all of these are still in general use, and importantly, each has strongly influenced
the development of later languages. At the end of the 1950s, the language
formalized as Algol 60 was introduced, and most later programming languages are,
in many respects, descendants of Algol. The format and use of the early
programming languages was heavily influenced by the constraints of the interface.
Refinement
The period from the 1960s to the late 1970s brought the development of the major
language paradigms now in use, though many aspects were refinements of ideas in
the very first Third-generation programming languages:
APL introduced array programming and influenced functional programming.
PL/I (NPL) was designed in the early 1960s to incorporate the best ideas from
FORTRAN and COBOL.
In the 1960s, Simula was the first language designed to support object-
oriented programming; in the mid-1970s, Smalltalk followed with the first
"purely" object-oriented language.
C was developed between 1969 and 1973 as a systems programming
language, and remains popular.
Prolog, designed in 1972, was the first logic programming language.
In 1978, ML built a polymorphic type system on top of Lisp, pioneering
statically typed functional programming languages.
Each of these languages spawned an entire family of descendants, and most modern
languages count at least one of them in their ancestry.
The 1960s and 1970s also saw considerable debate over the merits of structured
programming, and whether programming languages should be designed to support
it. Edsger Dijkstra, in a famous 1968 letter published in the Communications of the
81
ACM, argued that GOTO statements should be eliminated from all "higher level"
programming languages.
The 1960s and 1970s also saw expansion of techniques that reduced the footprint of
a program as well as improved productivity of the programmer and user. The card
deck for an early 4GL was a lot smaller for the same functionality expressed in a 3GL
deck.
Consolidation and growth
The 1980s were years of relative consolidation. C++ combined object-oriented and
systems programming. The United States government standardized Ada, a systems
programming language intended for use by defense contractors. In Japan and
elsewhere, vast sums were spent investigating so-called "fifth generation" languages
that incorporated logic programming constructs. The functional languages
community moved to standardize ML and Lisp. Rather than inventing new
paradigms, all of these movements elaborated upon the ideas invented in the
previous decade.
One important trend in language design during the 1980s was an increased focus on
programming for large-scale systems through the use of modules, or large-scale
organizational units of code. Modula-2, Ada, and ML all developed notable module
systems in the 1980s, although other languages, such as PL/I, already had extensive
support for modular programming. Module systems were often wedded to generic
programming constructs.
The rapid growth of the Internet in the mid-1990's created opportunities for new
languages. Perl, originally a Unix scripting tool first released in 1987, became
common in dynamic Web sites. Java came to be used for server-side programming.
These developments were not fundamentally novel, rather they were refinements to
existing languages and paradigms, and largely based on the C family of
programming languages.
Programming language evolution continues, in both industry and research. Current
directions include security and reliability verification, new kinds of modularity
(mixins, delegates, aspects), and database integration.
82
The 4GLs are examples of languages which are domain-specific, such as SQL, which
manipulates and returns sets of data rather than the scalar values which are
canonical to most programming languages. Perl, for example, with its 'here
document' can hold multiple 4GL programs, as well as multiple JavaScript programs,
in part of its own perl code and use variable interpolation in the 'here document' to
support multi-language programming.
Measuring language usage
It is difficult to determine which programming languages are most widely used, and
what usage means varies by context. One language may occupy the greater number
of programmer hours, a different one have more lines of code, and a third utilize the
most CPU time. Some languages are very popular for particular kinds of applications.
For example, COBOL is still strong in the corporate data center, often on large
mainframes; FORTRAN in engineering applications; C in embedded applications and
operating systems; and other languages are regularly used to write many different
kinds of applications.
Various methods of measuring language popularity, each subject to a different bias
over what is measured, have been proposed:
counting the number of job advertisements that mention the language.
the number of books sold that teach or describe the language
estimates of the number of existing lines of code written in the language—
which may underestimate languages not often found in public searches
counts of language references found using a web search engine.
Taxonomies
There is no overarching classification scheme for programming languages. A given
programming language does not usually have a single ancestor language. Languages
commonly arise by combining the elements of several predecessor languages with
new ideas in circulation at the time. Ideas that originate in one language will diffuse
throughout a family of related languages, and then leap suddenly across familial
gaps to appear in an entirely different family.
83
The task is further complicated by the fact that languages can be classified along
multiple axes. For example, Java is both an object-oriented language (because it
encourages object-oriented organization) and a concurrent language (because it
contains built-in constructs for running multiple threads in parallel). Python is an
object-oriented scripting language.
In broad strokes, programming languages divide into programming paradigms and a
classification by intended domain of use. Paradigms include procedural programming,
object-oriented programming, functional programming, and logic programming;
some languages are hybrids of paradigms or multi-paradigmatic. An assembly
language is not so much a paradigm as a direct model of an underlying machine
architecture. By purpose, programming languages might be considered general
purpose, system programming languages, scripting languages, domain-specific
languages, or concurrent/distributed languages (or a combination of these). Some
general purpose languages were designed largely with educational goals.
A programming language may also be classified by factors unrelated to programming
paradigm. For instance, most programming languages use English language
keywords, while a minority do not. Other languages may be classified as being
esoteric or not.
84
Windows XP
Windows XP is a line of operating systems developed by Microsoft for use on
general-purpose computer systems, including home and business desktops,
notebook computers, and media centers. The letters "XP" stand for eXPerience. It
was codenamed "Whistler", after Whistler, British Columbia, as many Microsoft
employees skied at the Whistler-Blackcomb ski resort during its development.
Windows XP is the successor to both Windows 2000 Professional and Windows Me,
and is the first consumer-oriented operating system produced by Microsoft to be built
on the Windows NT kernel and architecture. Windows XP was first released on
October 25, 2001, and over 400 million copies were in use in January 2006,
according to an estimate in that month by an IDC analyst. It is succeeded by
Windows Vista, which was released to volume license customers on November 8,
2006, and worldwide to the general public on January 30, 2007.
The most common editions of the operating system are Windows XP Home Edition,
which is targeted at home users, and Windows XP Professional, which has additional
features such as support for Windows Server domains and two physical processors,
and is targeted at power users and business clients. Windows XP Media Center
Edition has additional multimedia features enhancing the ability to record and watch
TV shows, view DVD movies, and listen to music. Windows XP Tablet PC Edition is
designed to run the ink-aware Tablet PC platform. Two separate 64-bit versions of
Windows XP were also released, Windows XP 64-bit Edition for IA-64 (Itanium)
processors and Windows XP Professional x64 Edition for x86-64.
Windows XP is known for its improved stability and efficiency over the 9x versions of
Microsoft Windows. It presents a significantly redesigned graphical user interface, a
change Microsoft promoted as more user-friendly than previous versions of Windows.
New software management capabilities were introduced to avoid the "DLL hell" that
plagued older consumer-oriented 9x versions of Windows. It is also the first version
of Windows to use product activation to combat software piracy, a restriction that did
not sit well with some users and privacy advocates. Windows XP has also been
criticized by some users for security vulnerabilities, tight integration of applications
85
such as Internet Explorer and Windows Media Player, and for aspects of its default
user interface.
Windows XP had been in development since early 1999, when Microsoft started
working on Windows Neptune, an operating system intended to be the "Home
Edition" equivalent to Windows 2000 Professional. It was eventually merged into the
Whistler project, which later became Windows XP.
Editions
The two major editions are Windows XP Home Edition, designed for home users, and
Windows XP Professional, designed for business and power-users. Other builds of
Windows XP include those built for specialized hardware and limited-feature versions
sold in Europe and select developing economies.
Windows XP Professional offers a number of features unavailable in the Home
Edition, including:
The ability to become part of a Windows Server domain, a group of computers
that are remotely managed by one or more central servers.
A sophisticated access control scheme that allows specific permissions on files
to be granted to specific users under normal circumstances. However, users
can use tools other than Windows Explorer (like cacls or File Manager), or
restart to Safe Mode to modify access control lists.
Remote Desktop server, which allows a PC to be operated by another
Windows XP user over a local area network or the Internet.
Offline Files and Folders, which allow the PC to automatically store a copy of
files from another networked computer and work with them while
disconnected from the network.
Encrypting File System, which encrypts files stored on the computer's hard
drive so they cannot be read by another user, even with physical access to
the storage medium.
Centralized administration features, including Group Policies, Automatic
Software Installation and Maintenance, Roaming User Profiles, and Remote
Installation Service (RIS).
Internet Information Services (IIS), Microsoft's HTTP and FTP Server.
86
Support for two physical central processing units (CPU). (Because the number
of CPU cores and Hyper-threading capabilities on modern CPUs are considered
to be part of a single physical processor, multicore CPUs are supported using
XP Home Edition.)
Windows Management Instrumentation Console (WMIC): WMIC is a
command-line tool designed to ease WMI information retrieval about a
system by using simple keywords (aliases).
Windows XP for specialized hardware
Microsoft has also customized Windows XP to suit different markets. Six different
versions of Windows XP for specific hardware were designed, two of them specifically
for 64-bit processors.
Windows XP 64-bit Edition
This edition was designed specifically for Itanium-based workstations. This
edition was discontinued in early 2005, after HP, the last distributor of
Itanium-based workstations, stopped selling Itanium systems marketed as
'workstations'. However, Itanium support continues in the server editions of
Windows.
Windows XP Professional x64 Edition
Not to be confused with the aforementioned Itanium edition of Windows XP,
this edition is derived from Windows Server 2003 and supports the x86-64
extension of the Intel IA-32 architecture. x86-64 is implemented by AMD as
"AMD64", found in AMD's Opteron and Athlon 64 chips, and implemented by
Intel as "Intel 64" (formerly known as IA-32e and EM64T), found in Intel's
Pentium 4 and later chips.
Microsoft had previously supported other microprocessors with earlier
versions of the Windows NT operating system line (including two 64-bit lines,
the DEC Alpha and the MIPS R4000, although Windows NT used them as 32-
bit processors). The files necessary for all of the architectures were included
on the same installation CD and did not require the purchase of separate
versions.
Windows XP Media Center Edition
This edition is designed for media center PCs. Originally, it was only available
bundled with one of these computers, and could not be purchased separately.
87
In 2003 the Media Center Edition was updated as "Windows XP Media Center
Edition 2003", which added additional features such as FM radio tuning.
Another update was released in 2004, and again in 2005, which was the first
edition available for System Builders. Many of the features of Windows XP
Media Center Edition 2005 (including screen dancers, auto playlist DJ, and
high end visual screen savers) were taken from the Windows XP Plus!
packages. These were originally shipped as add ons to Window XP to increase
the users experience of their Windows XP machine.
Windows XP Tablet PC Edition
Intended for specially designed notebook/laptop computers called tablet PCs,
the Tablet PC Edition is compatible with a pen-sensitive screen, supporting
handwritten notes and portrait-oriented screens. It cannot be purchased
separately from a Tablet PC without an MSDN (Microsoft Developers Network)
subscription.
Windows XP Embedded
An edition for specific consumer electronics, set-top boxes, kiosks/ATMs,
medical devices, arcade video games, point-of-sale terminals, and Voice over
Internet Protocol (VoIP) components.
Windows Fundamentals for Legacy PCs
In July 2006, Microsoft introduced a "thin-client" version of Windows XP called
Windows Fundamentals for Legacy PCs, which targets older machines (as
early as the original Pentium). It is only available to Software Assurance
customers. It is intended for those who would like to upgrade to Windows XP
to take advantage of its security and management capabilities, but can't
afford to purchase new hardware.
Windows XP Starter Edition
Windows XP Starter Edition is a lower-cost version of Windows XP available in
Thailand, Turkey, Malaysia, Indonesia, Russia, India, Colombia, Brazil, Argentina,
Peru, Bolivia, Chile, Mexico, Ecuador, Uruguay and Venezuela. It is similar to
Windows XP Home, but is limited to low-end hardware, can only run 3 programs at a
time, and has some other features either removed or disabled by default.
88
According to a Microsoft press release, Windows XP Starter Edition is "a low-cost
introduction to the Microsoft Windows XP operating system designed for first-time
desktop PC users in developing countries."
Specializations
The Starter Edition includes some special features for certain markets where
consumers may not be computer literate. Not found in the Home Edition, these
include localized help features for those who may not speak English, a country-
specific computer wallpaper and screensavers, and other default settings designed
for easier use than typical Windows XP installations.
In addition, the Starter Edition also has some unique limitations to prevent it from
displacing more expensive versions of Windows XP. Only three applications can be
run at once on the Starter Edition, and each application may open a maximum of
three windows. The maximum screen resolution is 1024×768, and there is no
support for workgroup networking or domains. In addition, the Starter Edition is
licensed only for low-end processors like Intel's Celeron or AMD's Duron and
Sempron. There is also a 512 MB limit on main memory and a 120 GB disk size limit
(Microsoft has not made it clear, however, if this is for total disk space, per partition,
or per disk). There are also fewer options for customizing the themes, desktop, and
taskbar.
Market adoption
On October 9, 2006, Microsoft announced that they reached a milestone of
1,000,000 units of Windows XP Starter Edition sold. In the mass market, however,
the Starter Edition has not had much success. Many markets where it is available
have seen the uptake of illegally cracked or pirated versions of the software instead.
Windows XP Edition N
In March 2004, the European Commission fined Microsoft €497 million (US$603
million) and ordered the company to provide a version of Windows without Windows
Media Player. The Commission concluded that Microsoft "broke European Union
competition law by leveraging its near monopoly in the market for PC operating
systems onto the markets for work group server operating systems and for media
89
players". Microsoft is currently appealing the ruling. In the meantime, a court-
compliant version has been released. This version does not include the company's
Windows Media Player but instead encourages users to pick and download their own
media player. Microsoft wanted to call this version Reduced Media Edition, but EU
regulators objected and suggested the Edition N name, with the N signifying "not
with Media Player" for both Home and Professional editions of Windows XP. Due to
the fact that it is sold at the same price as the version with Windows Media Player
included, Dell, Hewlett-Packard, Lenovo and Fujitsu Siemens have chosen not to
stock the product. However, Dell did offer the operating system for a short time.
Consumer interest has been low, with roughly 1,500 units shipped to OEMs, and no
reported sales to consumers.
Languages
Windows XP is available in many languages. In addition, add-ons translating the user
interface are also available for certain languages.
New and updated features
Windows XP introduced several new features to the Windows line, including:
Faster start-up and hibernation sequences
The ability to discard a newer device driver in favour of the previous one
(known as driver rollback), should a driver upgrade not produce desirable
results
A new, arguably more user-friendly interface, including the framework for
developing themes for the desktop environment
Fast user switching, which allows a user to save the current state and open
applications of their desktop and allow another user to log on without losing
that information
The ClearType font rendering mechanism, which is designed to improve text
readability on Liquid Crystal Display (LCD) and similar monitors
Remote Desktop functionality, which allows users to connect to a computer
running Windows XP from across a network or the Internet and access their
applications, files, printers, and devices
Support for most DSL modems and wireless network connections, as well as
networking over FireWire, and Bluetooth.
90
User interface
The new "task grouping" feature introduced in Windows XP
Windows XP features a new task-based graphical user interface. The Start menu and
search capability were redesigned and many visual effects were added, including:
A translucent blue selection rectangle in Explorer
Drop shadows for icon labels on the desktop
Task-based sidebars in Explorer windows ("common tasks")
The ability to group the taskbar buttons of the windows of one application into
one button
The ability to lock the taskbar and other toolbars to prevent accidental
changes
The highlighting of recently added programs on the Start menu
Shadows under menus (Windows 2000 had shadows under mouse pointers,
but not menus)
Windows XP analyses the performance impact of visual effects and uses this to
determine whether to enable them, so as to prevent the new functionality from
consuming excessive additional processing overhead. Users can further customize
these settings. Some effects, such as alpha blending (transparency and fading), are
handled entirely by many newer video cards. However, if the video card is not
capable of hardware alpha blending, performance can be substantially hurt and
Microsoft recommends the feature should be turned off manually. Windows XP adds
the ability for Windows to use "Visual Styles" to change the user interface. However,
visual styles must be cryptographically signed by Microsoft to run. Luna is the name
of the new visual style that ships with Windows XP, and is enabled by default for
machines with more than 64 MiB of RAM. Luna refers only to one particular visual
style, not to all of the new user interface features of Windows XP as a whole. In
order to use unsigned visual styles, many users turn to software such as TGTSoft's
StyleXP or Stardock's WindowBlinds. Some users "patch" the uxtheme.dll file that
restricts the ability to use visual styles, created by the general public or the user, on
Windows XP.
91
The default wallpaper, Bliss, is a BMP photograph of a landscape in the Napa Valley
outside Napa, California, with rolling green hills and a blue sky with stratocumulus
and cirrus clouds.
The Windows 2000 "classic" interface can be used instead if preferred. Several third
party utilities exist that provide hundreds of different visual styles. In addition,
another Microsoft-created theme, called "Royale", was included with Windows XP
Media Center Edition, and was also released for other versions of Windows XP.
System requirements
System requirements for Windows XP Home and Professional editions as follows:
Minimum Recommended
Processor 233 MHz 300 MHz or higher
Memory
64 MB RAM (may limit
performance and some
features)
128 MB RAM or higher
Video adapter and monitor Super VGA (800 x 600) Super VGA (800 x 600)
or higher resolution
Hard drive disk free space 1.5 GB 1.5 GB or higher
Drives CD-ROM CD-ROM or better
Devices Keyboard and mouse Keyboard and mouse
Others Sound card, speakers, and
headphones
Sound card, speakers,
and headphones
In addition to the Windows XP system requirements, Service Pack 2 requires an
additional 1.8 GB of free hard disk space during installation.
Service packs
Microsoft occasionally releases service packs for its Windows operating systems to fix
problems and add features. Each service pack is a superset of all previous service
packs and patches so that only the latest service pack needs to be installed, and also
92
includes new revisions. Older patches need not be removed before application of the
most recent one.
Service Pack 1
Service Pack 1 (SP1) for Windows XP was released on September 9, 2002. It
contains post-RTM security fixes and hot-fixes, compatibility updates, optional .NET
Framework support, enabling technologies for new devices such as Tablet PCs, and a
new Windows Messenger 4.7 version. The most notable new features were USB 2.0
support, and a Set Program Access and Defaults utility that aimed at hiding various
middleware products. Users can control the default application for activities such as
web browsing and instant messaging, as well as hide access to some of Microsoft's
bundled programs. This utility was first brought into the older Windows 2000
operating system with its Service Pack 3.
On February 3, 2003, Microsoft released Service Pack 1 (SP1) again as Service Pack
1a (SP1a). This release removed Microsoft's Java virtual machine as a result of a
lawsuit with Sun Microsystems.
Service Pack 2
Service Pack 2 (SP2) (codenamed "Springboard") was released on August 6, 2004
after several delays, with a special emphasis on security. Unlike the previous service
packs, SP2 adds new functionality to Windows XP, including an enhanced firewall,
improved Wi-Fi support, such as WPA encryption compatibility, with a wizard utility,
a pop-up ad blocker for Internet Explorer, and Bluetooth support. Security
enhancements include a major revision to the included firewall which was renamed
to Windows Firewall and is enabled by default, advanced memory protection that
takes advantage of the NX bit that is incorporated into newer processors to stop
some forms of buffer overflow attacks, and removal of raw socket support (which
supposedly limits the damage done by zombie machines). Additionally, security-
related improvements were made to e-mail and web browsing. Windows XP Service
Pack 2 includes the Windows Security Center, which provides a general overview of
security on the system, including the state of anti-virus software, Windows Update,
and the new Windows Firewall. Third-party anti-virus and firewall applications can
interface with the new Security Center.
93
On August 10, 2007, Microsoft announced a minor update to Service Pack 2, called
Service Pack 2c (SP2c). The update fixes the issue of the lowering number of
available product keys for Windows XP. This update will be only available to system
builders from their distributors in Windows XP Professional and Windows XP
Professional N operating systems. SP2c was released in September 2007.
Service Pack 3
Windows XP Service Pack 3 (SP3) is currently in development. As of November 2007,
Microsoft's web site indicates a "preliminary" release date to be in the first half of
2008. A feature set overview has been posted by Microsoft and details new features
available separately as standalone updates to Windows XP, as well as features
backported from Windows Vista, such as black hole router detection, Network Access
Protection and Windows Imaging Component.
Microsoft has begun a beta test of Service Pack 3. According to a file released with
the official beta, and relayed onto the internet, there are a total of 1073 fixes in SP3.
This update to Windows allows it to be installed without a product key, and be run
until the end of the mandatory 30-day activation period without one. The latest
testing build of Service Pack 3 is the Release Candidate build, 3264.
On December 4, 2007, Microsoft released a release candidate of Service Pack 3 to
both TechNet and MSDN Subscribers. On December 18, 2007, this version was made
publicly available via Microsoft Download Center.
Support lifecycle
Support for Windows XP without a service pack (RTM) ended on September 30, 2004
and support for Windows XP Service Pack 1 and 1a ended on October 10, 2006.
Mainstream support for Windows XP Service Pack 2 will end on April 14, 2009, four
years after its general availability. As per Microsoft's posted timetable, the company
will stop licensing Windows XP to OEMs and terminate retail sales of the operating
system June 30, 2008, 17 months after the release of Windows Vista.
94
On April 14, 2009, Windows XP will began its "Extended Support" period that will last
for 5 years until April 8, 2014.
Common criticisms
Security issues
Windows XP has been criticized for its susceptibility to malware, viruses, trojan
horses, and worms. Security issues are compounded by the fact that users of the
Home edition, by default, receive an administrator account that provides unrestricted
access to the underpinnings of the system. If the administrator's account is broken
into, there is no limit to the control that can be asserted over the compromised PC.
Windows, with its large market share, has historically been a tempting target for
virus creators. Security holes are often invisible until they are exploited, making
preemptive action difficult. Microsoft has stated that the release of patches to fix
security holes is often what causes the spread of exploits against those very same
holes, as crackers figured out what problems the patches fixed, and then launch
attacks against unpatched systems. Microsoft recommends that all systems have
automatic updates turned on to prevent a system from being attacked by an
unpatched bug, but some business IT departments need to test updates before
deployment across systems to predict compatibility issues with custom software and
infrastructure. This deployment turn-around time also lengthens the time that
systems are left unsecure in the event of a released software exploit.
User interface and performance
Critics have claimed that the default Windows XP user interface (Luna) adds visual
clutter and wastes screen space while offering no new functionality and running more
slowly – with some even calling it 'the Fisher-Price interface'. Users who do not like
the new interface can easily switch back to the Windows Classic theme.
Integration of operating system features
In light of the United States v. Microsoft case which resulted in Microsoft being
convicted for abusing its operating system monopoly to overwhelm competition in
other markets, Windows XP has drawn fire for integrating user applications such as
95
Windows Media Player and Windows Messenger into the operating system, as well as
for its close ties to the Windows Live ID service.
Backward compatibility
Some users switching from Windows 9x to XP disliked its lack of DOS support.
Although XP comes with the ability to run DOS programs in a virtual DOS machine
along with the COMMAND.COM program from MS-DOS, XP still has trouble running
many old DOS programs. This is largely due to the fact that it is NT-based and does
not use DOS as a base OS.
Windows Vista
Windows Vista (pronounced /�vɪstə/) is a line of operating systems developed by
Microsoft for use on personal computers, including home and business desktops,
laptops, Tablet PCs, and media centers. Prior to its announcement on July 22, 2005,
Windows Vista was known by its codename "Longhorn".. Development was
completed on November 8, 2006; over the following three months it was released in
stages to computer hardware and software manufacturers, business customers, and
retail channels. On January 30, 2007, it was released worldwide to the general
public,. and was made available for purchase and downloading from Microsoft's web
site.. The release of Windows Vista comes more than five years after the introduction
of its predecessor, Windows XP, making it the longest time span between two
releases of Microsoft Windows.
Windows Vista contains hundreds of new and reworked features; some of the most
significant include an updated graphical user interface and visual style dubbed
Windows Aero, improved searching features, new multimedia creation tools such as
Windows DVD Maker, and completely redesigned networking, audio, print, and
display sub-systems. Vista also aims to increase the level of communication between
machines on a home network using peer-to-peer technology, making it easier to
share files and digital media between computers and devices. Windows Vista includes
96
version 3.0 of the .NET Framework, which aims to make it significantly easier for
developers to write applications than with the traditional Windows API.
Microsoft's primary stated objective with Windows Vista, however, has been to
improve the state of security in the Windows operating system.. One common
criticism of Windows XP and its predecessors has been their commonly exploited
security vulnerabilities and overall susceptibility to malware, viruses and buffer
overflows. In light of this, Microsoft chairman Bill Gates announced in early 2002 a
company-wide "Trustworthy Computing initiative" which aims to incorporate security
work into every aspect of software development at the company. Microsoft stated
that it prioritized improving the security of Windows XP and Windows Server 2003
above finishing Windows Vista, thus delaying its completion..
Windows Vista has received a number of negative assessments. PC World listed it #1
of "the 15 biggest tech disappointments of 2007," saying that "many users are
clinging to XP like shipwrecked sailors to a life raft, while others who made the
upgrade are switching back.". Criticism of Windows Vista include protracted
development time, more restrictive licensing terms, the inclusion of a number of new
Digital Rights Management technologies aimed at restricting the copying of protected
digital media, lack of device drivers for some hardware, and the usability of other
new features such as User Account Control.
Development
Microsoft started work on their plans for Windows Vista ("Longhorn") in 2001,. prior
to the release of Windows XP. It was originally expected to ship sometime late in
2003 as a minor step between Windows XP (codenamed "Whistler") and "Blackcomb"
(now known as Windows 7). Gradually, "Longhorn" assimilated many of the
important new features and technologies slated for "Blackcomb", resulting in the
release date being pushed back several times. Many of Microsoft's developers were
also re-tasked with improving the security of Windows XP.. Faced with ongoing
delays and concerns about feature creep, Microsoft announced on August 27, 2004
that it was making changes. The original "Longhorn", based on the Windows XP
source code, was scrapped, and Vista development started anew, building on the
Windows Server 2003 codebase, and re-incorporating only the features that would
be intended for an actual operating system release. Some previously announced
97
features such as WinFS were dropped or postponed, and a new software
development methodology called the "Security Development Lifecycle" was
incorporated in an effort to address concerns with the security of the Windows
codebase.
After "Longhorn" was named Windows Vista, an unprecedented beta-test program
was started, involving hundreds of thousands of volunteers and companies. In
September 2005, Microsoft started releasing regular Community Technology
Previews (CTP) to beta testers. The first of these was distributed at the 2005
Microsoft Professional Developers Conference, and was subsequently released to
beta testers and Microsoft Developer Network subscribers. The builds that followed
incorporated most of the planned features for the final product, as well as a number
of changes to the user interface, based largely on feedback from beta testers.
Windows Vista was deemed feature-complete with the release of the "February CTP",
released on February 22, 2006, and much of the remainder of work between that
build and the final release of the product focused on stability, performance,
application and driver compatibility, and documentation. Beta 2, released in late
May, was the first build to be made available to the general public through
Microsoft's Customer Preview Program. It was downloaded by over five million
people. Two release candidates followed in September and October, both of which
were made available to a large number of users..
While Microsoft had originally hoped to have the operating system available
worldwide in time for Christmas 2006, it was announced in March 2006 that the
release date would be pushed back to January 2007, in order to give the company –
and the hardware and software companies which Microsoft depends on for providing
device drivers – additional time to prepare. Through much of 2006, analysts and
bloggers had speculated that Windows Vista would be delayed further, owing to anti-
trust concerns raised by the European Commission and South Korea, and due to a
perceived lack of progress with the beta releases. However, with the November 8,
2006 announcement of the completion of Windows Vista, Microsoft's lengthiest
operating system development project came to an end..
New or improved features
End-user features
98
The appearance of Windows Explorer has changed since Windows XP.
Windows Aero: The new hardware-based graphical user interface, named
Windows Aero – an acronym for Authentic, Energetic, Reflective, and Open.
The new interface is intended to be cleaner and more aesthetically pleasing
than those of previous Windows, including new transparencies, live
thumbnails, live icons, animations, and eye candy.
Windows Shell: The new Windows shell is significantly different from Windows
XP, offering a new range of organization, navigation, and search capabilities.
Windows Explorer's task panel has been removed, integrating the relevant
task options into the toolbar. A "Favorite links" panel has been added,
enabling one-click access to common directories. The address bar has been
replaced with a breadcrumb navigation system. The preview panel allows
users to see thumbnails of various files and view the contents of documents.
The details panel shows information such as file size and type, and allows
viewing and editing of embedded tags in supported file formats. The Start
menu has changed as well; it no longer uses ever-expanding boxes when
navigating through Programs. The word "Start" itself has been removed in
favor of a blue Windows Orb (also called "Pearl").
Instant Search (also known as search as you type) : Windows Vista features a
new way of searching called Instant Search, which is significantly faster and
more in-depth (content-based) than the search features found in any of the
previous versions of Windows..
Windows Sidebar: A transparent panel anchored to the side of the screen
where a user can place Desktop Gadgets, which are small applets designed
for a specialized purpose (such as displaying the weather or sports scores).
Gadgets can also be placed on other parts of the desktop.
Windows Internet Explorer 7: New user interface, tabbed browsing, RSS, a
search box, improved printing,. Page Zoom, Quick Tabs (thumbnails of all
open tabs) , Anti-Phishing filter, a number of new security protection
features, Internationalized Domain Name support (IDN) , and improved web
standards support. IE7 in Windows Vista runs in isolation from other
applications in the operating system (protected mode) ; exploits and
malicious software are restricted from writing to any location beyond
Temporary Internet Files without explicit user consent.
99
Windows Media Player 11
Windows Media Player 11, a major revamp of Microsoft's program for playing
and organizing music and video. New features in this version include word
wheeling (or "search as you type") , a new GUI for the media library, photo
display and organization, the ability to share music libraries over a network
with other Vista machines, Xbox 360 integration, and support for other Media
Center Extenders.
Backup and Restore Center: Includes a backup and restore application that
gives users the ability to schedule periodic backups of files on their computer,
as well as recovery from previous backups. Backups are incremental, storing
only the changes each time, minimizing the disk usage. It also features
Complete PC Backup (available only in Ultimate, Business, and Enterprise
versions) which backs up an entire computer as an image onto a hard disk or
DVD. Complete PC Backup can automatically recreate a machine setup onto
new hardware or hard disk in case of any hardware failures. Complete PC
Restore can be initiated from within Windows Vista, or from the Windows
Vista installation CD in the event the PC is so corrupt that it cannot start up
normally from the hard disk.
Windows Mail: A replacement for Outlook Express that includes a new mail
store that improves stability, and features integrated Instant Search. It has
the Phishing Filter like IE7 and Junk mail filtering that is enhanced through
regular updates via Windows Update..
Windows Calendar is a new calendar and task application.
Windows Photo Gallery, a photo and movie library management application.
WPG can import from digital cameras, tag and rate individual items, adjust
colors and exposure, create and display slideshows (with pan and fade
effects), and burn slideshows to DVD.
Windows DVD Maker, a companion program to Windows Movie Maker that
provides the ability to create video DVDs based on a user's content. Users can
design a DVD with title, menu, video, soundtrack, pan and zoom motion
effects on pictures or slides.
Windows Media Center, which was previously exclusively bundled as a
separate version of Windows XP, known as Windows XP Media Center Edition,
100
has been incorporated into the Home Premium and Ultimate editions of
Windows Vista.
Games and Games Explorer: Games included with Windows have been
modified to showcase Vista's graphics capabilities. New games are Chess
Titans, Mahjong Titans and Purble Place. A new Games Explorer special folder
holds shortcuts and information to all games on the user's computer.
Windows Mobility Center.
Windows Mobility Center is a control panel that centralizes the most relevant
information related to mobile computing (brightness, sound, battery level /
power scheme selection, wireless network, screen orientation, presentation
settings, etc.).
Windows Meeting Space replaces NetMeeting. Users can share applications (or
their entire desktop) with other users on the local network, or over the
Internet using peer-to-peer technology (higher versions than Starter and
Home Basic can take advantage of hosting capabilities, limiting previous to
"join" mode only)
Shadow Copy automatically creates daily backup copies of files and folders.
Users can also create "shadow copies" by setting a System Protection Point
using the System Protection tab in the System control panel. The user can be
presented multiple versions of a file throughout a limited history and be
allowed to restore, delete, or copy those versions. This feature is available
only in the Business, Enterprise, and Ultimate editions of Windows Vista and
is inherited from Windows Server 2003..
Windows Update with Windows Ultimate Extras
Windows Update: Software and security updates have been simplified,. now
operating solely via a control panel instead of as a web application. Windows
Mail's spam filter and Windows Defender's definitions are updated
automatically via Windows Update. Users that choose the recommended
101
setting for Automatic Updates will have the latest drivers installed and
available when they add a new device.
Parental controls: Allows administrators to control which websites, programs,
and games each standard user can use and install. This feature is not
included in the Business or Enterprise editions of Vista.
Windows SideShow: Enables the auxiliary displays on newer laptops or on
supported Windows Mobile devices. It is meant to be used to display device
gadgets while the computer is on or off.
Speech recognition is integrated into Vista.. It features a redesigned user
interface and configurable command-and-control commands. Unlike the Office
2003 version, which works only in Office and WordPad, Speech Recognition in
Windows Vista works for any accessible application. In addition, it currently
supports several languages: British and American English, Spanish, French,
German, Chinese (Traditional and Simplified) , and Japanese.
New fonts, including several designed for screen reading, and improved
Chinese (Yahei, JhengHei) , Japanese (Meiryo) and Korean (Malgun) fonts.
See. ClearType has also been enhanced and enabled by default.
Problem Reports and Solutions, a control panel which allows users to view
previously sent problems and any solutions or additional information that is
available.
Improved audio controls allow the system-wide volume or volume of
individual audio devices and even individual applications to be controlled
separately. New audio functionalities such as Room Correction, Bass
Management, Speaker Fill and Headphone virtualization have also been
incorporated.
Windows System Assessment Tool is a tool used to benchmark system
performance. Software such as games can retrieve this rating and modify its
own behavior at runtime to improve performance. The benchmark tests CPU,
RAM, 2-D and 3-D graphics acceleration, Graphics Memory and Hard disk
space...
Windows Ultimate Extras: The Ultimate Edition of Windows Vista provides
access to extra games and tools, available through Windows Update. This
replaces the Microsoft Plus! software bundle that was sold alongside prior
versions of Windows.
102
Disk Management: A utility to modify hard disk drive partitions, including
shrinking, creating and formatting new partitions.
Performance Diagnostic Console includes various tools for tuning and
monitoring system performance and resources activities of CPU, disks,
network, memory and other resources. It shows the operations on files, the
opened connections, etc.
Core technologies
Windows Vista is intended to be a technology-based release, to provide a solid base
to include advanced technologies, many of which are related to how the system
functions, and hence not readily visible to the user. An example of this is the
complete restructuring of the architecture of the audio, print, display, and
networking subsystems; while the results of this work will be visible to software
developers, end-users will only see what appear to be evolutionary changes in the
user interface.
Vista includes technologies such as ReadyBoost and ReadyDrive which employ fast
flash memory (located on USB drives and hybrid hard disk drives respectively) to
improve system performance by caching commonly-used programs and data. This
manifests itself in improved battery life on notebook computers as well, since a
hybrid drive can be spun down when not in use. Another new technology called
ReadyDrive utilizes machine learning techniques to analyze usage patterns in order
to allow Windows Vista to make intelligent decisions about what content should be
present in system memory at any given time.
As part of the redesign of the networking architecture, IPv6 has been fully
incorporated into the operating system, and a number of performance improvements
have been introduced, such as TCP window scaling. Prior versions of Windows
typically needed third-party wireless networking software to work properly; this is no
longer the case with Vista, as it includes more comprehensive wireless networking
support.
For graphics, Vista introduces a new Windows Display Driver Model, as well as a
major revision to Direct3D. The new driver model facilitates the new Desktop
Window Manager, which provides the tearing-free desktop and special effects that
are the cornerstones of Windows Aero. Direct3D 10, developed in conjunction with
103
major display driver manufacturers, is a new architecture with more advanced
shader support, and allows the graphics processing unit to render more complex
scenes without assistance from the CPU. It features improved load balancing
between CPU and GPU and also optimizes data transfer between them..
At the core of the operating system, many improvements have been made to the
memory manager, process scheduler, heap manager, and I/O scheduler. A Kernel
Transaction Manager has been implemented that gives applications the ability to
work with the file system and registry using atomic transaction operations.
Security-related technologies
Improved security was a primary design goal for Vista.. Microsoft's Trustworthy
Computing initiative, which aims to improve public trust in its products, has had a
direct effect on its development. This effort has resulted in a number of new security
and safety features.
User Account Control is perhaps the most significant and visible of these changes.
User Account Control is a security technology that makes it possible for users to use
their computer with fewer privileges by default. This was often difficult in previous
versions of Windows, as the previous "limited" user accounts proved too restrictive
and incompatible with a large proportion of application software, and even prevented
some basic operations such as looking at the calendar from the notification tray. In
Windows Vista, when an action requiring administrative rights is requested, the user
is first prompted for an administrator name and password; in cases where the user is
already an administrator, the user is still prompted to confirm the pending privileged
action. User Account Control asks for credentials in a Secure Desktop mode, where
the entire screen is blacked out, temporarily disabled, and only the authorization
window is active and highlighted. The intent is to stop a malicious program 'spoofing'
the user interface, attempting to capture admin credentials.
Internet Explorer 7's new security and safety features include a phishing filter, IDN
with anti-spoofing capabilities, and integration with system-wide parental controls.
For added security, ActiveX controls are disabled by default. Also, Internet Explorer
operates in a "protected mode" which operates with lower permissions than the user
104
and it runs in isolation from other applications in the operating system, preventing it
from accessing or modifying anything besides the Temporary Internet Files
directory.. Microsoft's anti-spyware product, Windows Defender, has been
incorporated into Windows, providing protection against malware and other threats.
Changes to various system configuration settings (such as new auto-starting
applications) are blocked unless the user gives consent.
Another significant new feature is BitLocker Drive Encryption, a data protection
technology included in the Enterprise and Ultimate editions of Vista that provides
encryption for the entire operating system volume. Bitlocker can work in conjunction
with a Trusted Platform Module chip (version 1.2) that is on a computer's
motherboard, or with a USB key..
A variety of other privilege-restriction techniques are also built into Vista. An
example is the concept of "integrity levels" in user processes, whereby a process
with a lower integrity level cannot interact with processes of a higher integrity level
and cannot perform DLL–injection to a processes of a higher integrity level. The
security restrictions of Windows services are more fine-grained, so that services
(especially those listening on the network) have no ability to interact with parts of
the operating system they do not need to. Obfuscation techniques such as address
space layout randomization are used to increase the amount of effort required of
malware before successful infiltration of a system. Code Integrity verifies that
system binaries haven’t been tampered with by malicious code.
As part of the redesign of the network stack, Windows Firewall has been upgraded,
with new support for filtering both incoming and outgoing traffic. Advanced packet
filter rules can be created which can grant or deny communications to specific
services.
Business technologies
While much of the focus of Vista's new capabilities has been on the new user
interface, security technologies, and improvements to the core operating system,
Microsoft is also adding new deployment and maintenance features.
The WIM image format (Windows IMage) is the cornerstone of Microsoft's new
deployment and packaging system. WIM files, which contain an image of
105
Windows Vista, can be maintained and patched without having to rebuild new
images. Windows Images can be delivered via Systems Management Server
or Business Desktop Deployment technologies. Images can be customized
and configured with applications then deployed to corporate client personal
computers using little to no touch by a system administrator. ImageX is the
Microsoft tool used to create and customize images.
Windows Deployment Services replaces Remote Installation Services for
deploying Vista and prior versions of Windows.
Approximately 700 new Group Policy settings have been added, covering
most aspects of the new features in the operating system, as well as
significantly expanding the configurability of wireless networks, removable
storage devices, and user desktop experience. Vista also introduced an XML
based format (ADMX) to display registry-based policy settings, making it
easier to manage networks that span geographic locations and different
languages. .
Services for UNIX has been renamed "Subsystem for UNIX-based
Applications," and is included with the Enterprise and Ultimate editions of
Vista. Network File System (NFS) client support is also included.
Multilingual User Interface - Unlike previous version of Windows which
required language packs to be loaded to provide local language support,
Windows Vista Ultimate and Enterprise editions support the ability to
dynamically change languages based on the logged on user's preference.
Wireless Projector support
Business customers who are enrolled in the Microsoft Software Assurance program
are offered a set of additional tools and services collectively known as the "Desktop
Optimization Pack". This includes the Microsoft SoftGrid application virtualization
platform, an asset inventory service, and additional tools for maintaining Group
Policy settings in a fashion similar to a revision control system.
Developer technologies
Windows Vista includes a large number of new application programming interfaces.
Chief among them is the inclusion of version 3.0 of the .NET Framework, which
consists of a class library and Common Language Runtime. Version 3.0 includes four
new major components:.
106
Windows Presentation Foundation is a user interface subsystem and
framework based vector graphics, which makes use of 3D computer graphics
hardware and Direct3D technologies. It provides the foundation for building
applications and blending together application UI, documents, and media
content. It is the successor to Windows Forms.
Windows Communication Foundation is a service-oriented messaging
subsystem which enables applications and systems to interoperate locally or
remotely using Web services.
Windows Workflow Foundation provides task automation and integrated
transactions using workflows. It is the programming model, engine and tools
for building workflow-enabled applications on Windows.
Windows CardSpace is a component which securely stores digital identities of
a person, and provides a unified interface for choosing the identity for a
particular transaction, such as logging into a website.
These technologies are also available for Windows XP and Windows Server 2003 to
facilitate their introduction to and usage by developers and end users.
There are also significant new development APIs in the core of the operating system,
notably the completely re-architected audio, networking, print, and video interfaces,
major changes to the security infrastructure, improvements to the deployment and
installation of applications ("ClickOnce" and Windows Installer 4.0) , new device
driver development model ("Windows Driver Foundation") , Transactional NTFS,
mobile computing API advancements (power management, Tablet PC Ink support,
SideShow) and major updates to (or complete replacements of) many core
subsystems such as Winlogon and CAPI.
There are some issues for software developers using some of the graphics APIs in
Vista. Games or programs which are built solely on Vista's version of DirectX, 10,
cannot work on prior versions of Windows, as DirectX 10 is not backwards-
compatible with these versions.. According to a Microsoft blog, there are three
choices for OpenGL implementation on Vista. An application can use the default
implementation, which translates OpenGL calls into the Direct3D API and is frozen at
OpenGL version 1.4, or an application can use an Installable Client Driver (ICD) ,
which comes in two flavors: legacy and Vista-compatible. A legacy ICD, the kind
already provided by independent hardware vendors targeting Windows XP, disable
107
the Desktop Window Manager. A Vista-compatible ICD takes advantage of a new
API, and is fully compatible with the Desktop Window Manager.. At least two primary
vendors, ATI and NVIDIA, are expected to provide full Vista-compatible ICDs in the
near future.. However, hardware overlay is not supported, because it is considered
as an obsolete feature in Vista. ATI and NVIDIA strongly recommend using
compositing desktop/FBOs for same functionality..
Deprecated features
Some notable Windows XP features and components have been replaced or removed
in Windows Vista, including Windows Messenger, the network Messenger Service,
HyperTerminal, MSN Explorer, Active Desktop, and the replacement of NetMeeting
with Windows Meeting Space. Windows Vista also does not include the Windows XP
"Luna" visual theme, or most of the classic color schemes which have been part of
Windows since the Windows 3.x era. The "Hardware profiles" startup feature has
been removed as well, along with support for older motherboard technologies like
the EISA bus, APM and Game port support. There is a way to enable Game port
support on Vista by applying an older driver.. IP over FireWire (TCP/IP over IEEE
1394) has been removed as well..
Some traditional features of Windows have been either eliminated or severely
crippled. For example, Wordpad will no longer open or save files in .doc format, and
Windows Sound Recorder has had much of its functionality removed. WinHlp32.exe,
used to display 32-bit .hlp files (help pages), is no longer included in Windows Vista
as Microsoft considers it obsolete,. though it is available as a separate download.
Microsoft prohibits software manufacturers from re-introducing the .hlp help system
with their products. Finally, Telnet.exe is no longer installed by default, but is still
included as an installable feature..
Editions
Windows Vista ships in six editions.. These editions are roughly divided into two
target markets, consumer and business, with editions varying to cater for specific
108
sub-markets. For consumers, there are four editions, with three available for
developed countries. Windows Vista Starter edition is limited to emerging markets.
Windows Vista Home Basic is intended for budget users with low needs. Windows
Vista Home Premium covers the majority of the consumer market. Windows Vista
Ultimate contains the complete feature-set and is aimed at enthusiasts. For
businesses, there are two versions. Windows Vista Business is specifically designed
for small business., while Windows Vista Enterprise, the premium business edition. is
only available to customers participating in Microsoft's Software Assurance program.
All editions except Windows Vista Starter support both 32-bit (x86) and 64-bit (x64)
processor architectures.
In the European Union, Home Basic N and Business N versions are also available.
These versions come without Windows Media Player, due to EU sanctions brought
against Microsoft for violating anti-trust laws. Similar sanctions exist in South Korea.
Visual styles
Windows Vista has four distinct visual styles.
Windows Aero
Windows Vista's premier visual style is built on a new desktop composition
engine called Desktop Window Manager. Windows Aero introduces support for
3D graphics (Windows Flip 3D) , translucency effects (Glass) , live
thumbnails, window animations, and other visual effects, and is intended for
mainstream and high-end graphics cards. To enable these features, the
contents of every open window is stored in video memory to facilitate tearing-
free movement of windows. As such, Windows Aero has significantly higher
hardware requirements than its predecessors. 128 MB of graphics memory is
the minimum requirement, depending on resolution used.. Windows Aero
(including Windows Flip 3D) is not included in the Starter and Home Basic
editions.
Windows Vista Standard
This mode is a variation of Windows Aero without the glass effects, window
animations, and other advanced graphical effects such as Windows Flip 3D.
Like Windows Aero, it uses the Desktop Window Manager, and has generally
the same video hardware requirements as Windows Aero. This is the default
109
mode for the Windows Vista Home Basic Edition. The Starter Edition does not
support this mode.
Windows Vista Basic
This mode has aspects that are similar to Windows XP's visual style with the
addition of subtle animations such as those found on progress bars. It does
not employ the Desktop Window Manager; as such, it does not feature
transparency or translucency, window animation, Windows Flip 3D or any of
the functions provided by the DWM. The Basic mode does not require the new
Windows Display Driver Model (WDDM) for display drivers, and has similar
graphics card requirements to Windows XP. For computers with graphics
cards that are not powerful enough to support Windows Aero, this is the
default graphics mode.
Windows Classic
Windows Classic has the look and feel of Windows 2000 and Windows Server
2003, does not use the Desktop Window Manager, and does not require a
WDDM driver. As with prior versions of Windows, this visual style supports
"color schemes," which are a collection of color settings. Windows Vista
includes six classic color schemes, comprised of four high-contrast color
schemes and the default color schemes from Windows 98 and Windows 2000.
Hardware requirements
Computers capable of running Windows Vista are classified as Vista Capable and
Vista Premium Ready.. A Vista Capable or equivalent PC is capable of running all
editions of Windows Vista although some of the special features and high end
graphics options may require additional or more advanced hardware. A Vista
Premium Ready PC can take advantage of Vista's "high-end" features..
Windows Vista's "Basic" and "Classic" interfaces work with virtually any graphics
hardware that supports Windows XP or 2000; accordingly, most discussion around
Vista's graphics requirements centers on those for the Windows Aero interface. As of
Windows Vista Beta 2, the NVIDIA GeForce 6 series and later, the ATI Radeon 9500
and later, Intel's GMA 950 integrated graphics, and a handful of VIA chipsets and S3
110
Graphics discrete chips are supported. Although originally supported, the GeForce FX
5 series has been dropped from newer drivers from NVIDIA. The last driver from
NVIDIA to support the GeForce FX series on Vista was 96.85... Microsoft offers a tool
called the Windows Vista Upgrade Advisor. to assist XP and Vista users in
determining what versions of Windows their machine is capable of running. Although
the installation media included in retail packages is a 32-bit DVD, customers without
a DVD-ROM or customers who wish for a 64-bit install media are able to acquire this
media through the Windows Vista Alternate Media program..
Windows Vista system requirements
Vista Capable. Vista Premium Ready.
Processor 800 MHz 1.0 GHz
Memory 512 MB RAM 1 GB RAM
Graphics card DirectX 9 capable
DirectX 9 capable GPU with Hardware
Pixel Shader v2.0 and WDDM 1.0
driver support
Graphics memory N/A
128 MB RAM supports up to 2,756,000
total pixels (e.g. 1920 × 1200) or 512
MB+ for greater resolutions such as
2560x1600.
HDD capacity 20 GB 40 GB
HDD free space 15 GB 15 GB
Other drives CD-ROM DVD-ROM
Service Pack 1
Microsoft occasionally releases service packs for its Windows operating systems to fix
problems and add features. Windows Vista Service Pack 1 (SP1) is currently in
development. Microsoft is planning to release SP1 alongside Windows Server 2008 in
the first quarter of 2008.... The first beta of Windows Vista Service Pack 1, build
16659, was released on September 24, 2007 and is currently being tested by
TechBeta participants in the Windows Vista SP1 Beta Program as well as TechNet and
MSDN subscribers..
111
On December 12, 2007, Microsoft released Windows Vista Service Pack 1 (SP1)
Release Candidate as an open beta to the general public. The RC build is
documented to contain 489 patches, most of which are documented in the Microsoft
Knowledge Base but are unavailable for download..
A whitepaper published by Microsoft near the end of August 2007 outlined the scope
and intent of the service pack, identifying three major areas of improvement:
reliability and performance, administration experience, and support for newer
hardware and standards.
One area of particular note is performance. Areas of improvement include file copy
operations, hibernation, logging off on domain-joined machines, Javascript parsing in
Internet Explorer, network file share browsing. Windows Explorer ZIP file handling.
and Windows Disk Defragmenter. The ability to choose individual drives to
defragment is being reintroduced as well.
Service Pack 1 introduces support for some new hardware and software standards,
notably the exFAT file system, 802.11n wireless networking, IPv6 over VPN
connections, and the Secure Socket Tunneling Protocol. An updated version of
Windows Installer is included that provides support for multi-package transactions
and embedding the user interface of a child Windows Installer package inside a
parent installation session. Booting a system using Extensible Firmware Interface on
x64 systems is also being introduced; this feature had originally been slated for the
initial release of Vista but was delayed due to a lack of compatible hardware at the
time.
Two areas which have seen changes in Service Pack 1 that have come as the result
of concerns from software vendors. One of these is desktop search; users will be
able to change the default desktop search program to one provided by a third party
instead of the Microsoft desktop search program that comes with Windows Vista.
Third-party desktop search programs will be able to seamlessly tie in their services
into the operating system. These changes come in part due to complaints from
Google, whose Google Desktop Search application was hindered by the presence of
Vista's built-in desktop search. In June 2007, Google claimed that the changes being
introduced for Service Pack 1 "are a step in the right direction, but they should be
improved further to give consumers greater access to alternate desktop search
112
providers ". The other area of note is a set of new security APIs being introduced for
the benefit of antivirus software that currently relies on the unsupported practice of
patching the kernel.
An update to Direct3D, 10.1, is planned for inclusion,. which is expected to make
mandatory several features which were previously optional in Direct3D 10 hardware.
The whitepaper also notes that Service Pack 1 will include a kernel that will be up-to-
date with the version to be shipped with Windows Server 2008.
Support for the Group Policy Management Console is being removed; a replacement
is planned for release the same time frame as the release of the service pack.
Criticism
Windows Vista has received a number of negative assessments. Criticisms of
Windows Vista include protracted development time, more restrictive licensing
terms, the inclusion of a number of technologies aimed at restricting the copying of
protected digital media, and the usability of the new User Account Control security
technology. Reviewers have also noted some similarities between Vista's Aero
interface and that of Apple's Aqua interface for the Mac OS X operating system.
Moreover, some concerns have been raised about many PCs meeting "Vista Premium
Ready" hardware requirements and Vista's pricing.
Hardware requirements
Whilst, according to Microsoft, "nearly all PCs on the market today [2005] will run
Windows Vista”, the higher requirements of some of the 'premium' features, such as
the Aero interface, have impacted many upgraders. According to The Times in May
2006, the full set of features “would be available to less than 5 percent of Britain’s
PC market”. This continuing lack of clarity eventually lead to a class action against
Microsoft as people found themselves with new computers that were unable to run
the new software despite assurances.
Slow file operations
When released, Vista performed file operations such as copying and deletion more
slowly than other operating systems. Large copies required when migrating from one
113
computer to another seemed difficult or impossible without workarounds such as
using the command line. This inability to perform basic file operations efficiently
attracted strong criticism. After six months, Microsoft confirmed the existence of
these problems by releasing a special performance and reliability update, which was
later disseminated through Windows Update, and will be included in SP1.
Licensing and cost
The introduction of additional licensing restrictions has been criticized. Criticism of
upgrade licenses pertaining to Windows Vista Starter through Home Premium was
expressed by Ars Technica's Ken Fisher, who noted that the new requirement of
having a prior operating system already installed was going to cause irritation for
users who reinstall Windows on a regular basis. It has been revealed that an
Upgrade copy Windows Vista can be installed clean without first installing a previous
version of Windows. On the first install, Windows will refuse to activate. The user
must then reinstall that same copy of Vista. Vista will then activate on the reinstall,
thus allowing a user to install an Upgrade of Windows Vista without owning a
previous operating system. As with Windows XP, separate rules still apply to OEM
versions of Vista installed on new PCs; these are not legally transferable. The cost of
Windows Vista has also been a source of concern and commentary. A majority of
users in a poll said that the prices of various Windows Vista editions posted on the
Microsoft Canada website in August 2006 make the product too expensive. A BBC
News report on the day of Vista's release suggested that, "there may be a backlash
from consumers over its pricing plans - with the cost of Vista versions in the US
roughly half the price of equivalent versions in the UK".
Digital rights management
Another common criticism concerns the integration of new forms of digital rights
management into the operating system, specifically the introduction of the Protected
Video Path. This architecture is designed such that "premium content" from HD DVD
or Blu-ray discs may mandate that the connections between PC components be
encrypted. Devices such as graphic cards must be approved by Microsoft. Depending
on what the content demands, the devices may not pass premium content over non-
encrypted outputs, or they must artificially degrade the quality of the signal on such
114
outputs or not display it all. There is also a revocation mechanism that allows
Microsoft to disable drivers of compromised devices in end-user PCs over the
Internet. Peter Gutmann, security researcher and author of the open source cryptlib
library, claims that these mechanisms violate fundamental rights of the user (such as
fair use), unnecessarily increase the cost of hardware, and make systems less
reliable, (the "tilt bit" is a particular worry; if triggered, the entire graphic subsystem
performs a reset) and vulnerable to denial-of-service attacks. Proponents have
claimed that Microsoft had no choice but to follow the demands of the movie studios,
and that the technology will not actually be enabled until after 2010; Microsoft also
noted that content protection mechanisms have existed in Windows as far back as
Windows Me, and that the new protections will not apply to any existing content
(only future content).
User Account Control
Concerns have been raised about the new User Account Control (UAC) security
technology. While Yankee Group analyst Andrew Jaquith believes that critical security
vulnerabilities may be "reduced by as much as 80%," he also noted that "while the
new security system shows promise, it is far too chatty and annoying". This
statement was made over six months before Vista was actually released. When
Windows Vista was released in November 2006, Microsoft had reduced the number of
operating system tasks that triggered UAC prompts, and added file and registry
virtualization to reduce the number of legacy applications that trigger UAC prompts.
Despite reductions in UAC prompts they still are triggered by many programs,
particularly programs not designed for Windows Vista.
Software Protection Platform
Vista includes an enhanced set of anti-piracy technologies, based on Windows XP's
WGA, called Software Protection Platform (SPP). A major component of this is a new
reduced functionality mode, which Vista enters when it detects that the user has
"failed product activation or of that copy being identified as counterfeit or non-
genuine", which is described in a Microsoft white paper as follows: "The default Web
browser will be started and the user will be presented with an option to purchase a
new product key. There is no start menu, no desktop icons, and the desktop
115
background is changed to black. [...] After one hour, the system will log the user out
without warning". This has been criticised for being overly draconian, especially
given reports of "false positives" by SPP's predecessor, and at least one temporary
validation server outage. Microsoft will be removing the reduced functionality mode
in Service Pack 1 in favor of prominent notices on systems not found to be genuine.
Public reception and sales
Due to the large growth of the PC market since the release of Windows XP, initial
sales of the operating system set a new high. Within its first month, 20 million copies
of Vista were sold, double the amount of XP sales within its first month in October
2001, five years earlier. That said, as a factor of the new market, Vista sales were
not high. For example, rival operating system Mac OS X Leopard's first month's sales
also doubled over the number of sales from the release of Mac OS X Jaguar five
years earlier in August 2002. However, in the case of Jaguar-to-Leopard sales, as
opposed to XP-to-Vista sales, users had less pressure to upgrade due to the
intermediate releases of both Mac OS X Panther and Tiger.
Even with these high sales, Windows Vista has received worse-than-expected public
reception due to the various criticisms and concerns about the operating system,
forcing Microsoft to extend Windows XP support and allow continued sales of the
older OS, as well as a general reduction in the percentage of PC sales.
Windows Vista has received relatively poor reception among consumers, with some
people having installed XP onto computers that were preloaded with Vista or
reverting their own upgrades, and many computer manufacturers are even including
XP restore disks with new computers. There is also a relative market decrease in
consumers intending to purchase PCs: according to studies conducted by
ChangeWave in November 2007, among potential buyers within the next three
months, only 71 percent of consumers (not counting businesses) intend tend to buy
PCs (with the remaining 29 percent instead favoring Macs), a number down from
approximately 87 percent in October 2005. 83 percent of those intending to
purchase Macs said that they "are choosing Macs because of Leopard and their
distaste for Vista". The same study also showed an increase in businesses intending
to buy Macs instead of Windows-based PCs, up from 4 to 7 percent between
November 2005 and November 2007, partly due to Vista's reception, as well.
116
Business adoption of Vista has been slower than anticipated, with the vast majority
still favoring Windows XP and even waiting for Windows 7, Microsoft's next version of
Windows scheduled for release in 2010. According to InformationWeek, in December
2006, 6 percent of business enterprises were expected to employ Vista within the
first year, yet as of December 2007, only about 1 percent of enterprises are actually
using Vista. Furthermore, while many businesses have bought licenses to run
Windows Vista, these companies are holding off deployment.
The usage share, as measured through web browser user agent strings, shows Vista
to have gained approximately 6.3% of the desktop OS market as of Q3 2007.
Though starting off slowly, speed appears to have increased in the last quarter. Even
so, as of December 2007, Amazon.com's bestselling operating systems list indicates
that Vista sales are still trailing the combined sales of Mac OS X Leopard and
Windows XP.
On Sunday,14 October 2007 The Dutch Consumers Association called for a boycott of
Windows Vista, after the software giant refused to offer free copies of Windows XP to
users who had problems with Vista. The 'Consumentenbond' said that the product
has many teething problems, and "is just not ready". The association claims it
received over 5000 complaints about Vista. Many printers and other hardware failed
to work, the association says, computers crash frequently and peripherals are very
slow.
117
Algorithm
In mathematics, computing, linguistics, and related disciplines, an algorithm is a
definite list of well-defined instructions for completing a task; that given an initial
state, will proceed through a well-defined series of successive states, eventually
terminating in an end-state. The transition from one state to the next is not
necessarily deterministic; some algorithms, known as probabilistic algorithms,
incorporate randomness.
The concept of an algorithm originated as a means of recording procedures for
solving mathematical problems such as finding the common divisor of two numbers
or multiplying two numbers. A partial formalization of the concept began with
attempts to solve the Entscheidungsproblem (the "decision problem") that David
Hilbert posed in 1928. Subsequent formalizations were framed as attempts to define
"effective calculability" or "effective method”; those formalizations included the
Gödel-Herbrand-Kleene recursive functions of 1930, 1934 and 1935, Alonzo Church's
lambda calculus of 1936, Emil Post's "Formulation I" of 1936, and Alan Turing's
Turing machines of 1936-7 and 1939.
Etymology
Al-Khwārizmī, Persian astronomer and mathematician, wrote a treatise in Arabic in
825 AD, On Calculation with Hindu Numerals. It was translated into Latin in the 12th
century as Algoritmi de numero Indorum, which title was likely intended to mean
"Algoritmus on the numbers of the Indians", where "Algoritmi" was the translator's
rendition of the author's name in the genitive case; but people misunderstanding the
title treated Algoritmi as a Latin plural and this led to the word "algorithm" (Latin
algorismus) coming to mean "calculation method". The intrusive "th" is most likely
due to a false cognate with the Greek αριθμος (arithmos) meaning "number".
118
Flowcharts are often used to graphically represent algorithms.
Why algorithms are necessary: an informal definition
No generally accepted formal definition of "algorithm" exists yet. We can, however,
derive clues to the issues involved and an informal meaning of the word from the
following quotation from Boolos and Jeffrey:
"No human being can write fast enough, or long enough, or small enough to
list all members of an enumerably infinite set by writing out their names, one
after another, in some notation. But humans can do something equally useful,
in the case of certain enumerably infinite sets: They can give explicit
instructions for determining the nth member of the set, for arbitrary
finite n. Such instructions are to be given quite explicitly, in a form in which
they could be followed by a computing machine, or by a human who is
capable of carrying out only very elementary operations on symbols" .
The words "enumerably infinite" mean "countable using integers perhaps extending
to infinity". Thus Boolos and Jeffrey are saying that an algorithm implies instructions
for a process that "creates" output integers from an arbitrary "input" integer or
integers that, in theory, can be chosen from 0 to infinity. Thus we might expect an
algorithm to be an algebraic equation such as y = m + n — two arbitrary "input
variables" m and n that produce an output y. As we see in Algorithm
characterizations — the word algorithm implies much more than this, something on
the order of (for our addition example):
Precise instructions (in language understood by "the computer") for a "fast,
efficient, good" process that specifies the "moves" of "the computer"
(machine or human, equipped with the necessary internally-contained
information and capabilities) to find, decode, and then munch arbitrary input
integers/symbols m and n, symbols + and = ... and (reliably, correctly,
"effectively") produce, in a "reasonable" time, output-integer y at a specified
place and in a specified format.
The concept of algorithm is also used to define the notion of decidability (logic). That
notion is central for explaining how formal systems come into being starting from a
119
small set of axioms and rules. In logic, the time that an algorithm requires to
complete cannot be measured, as it is not apparently related with our customary
physical dimension. From such uncertainties, that characterize ongoing work, stems
the unavailability of a definition of algorithm that suits both concrete (in some sense)
and abstract usage of the term.
For a detailed presentation of the various points of view around the definition
of "algorithm" see Algorithm characterizations. For examples of simple
addition algorithms specified in the detailed manner described in Algorithm
characterizations, see Algorithm examples.
Formalization of algorithms
Algorithms are essential to the way computers process information, because a
computer program is essentially an algorithm that tells the computer what specific
steps to perform (in what specific order) in order to carry out a specified task, such
as calculating employees’ paychecks or printing students’ report cards. Thus, an
algorithm can be considered to be any sequence of operations that can be performed
by a Turing-complete system. Authors who assert this thesis include Savage and
Gurevich:
"...Turing's informal argument in favor of his thesis justifies a stronger thesis:
every algorithm can be simulated by a Turing machine" ...according to
Savage, "an algorithm is a computational process defined by a Turing
machine."
Typically, when an algorithm is associated with processing information, data are read
from an input source or device, written to an output sink or device, and/or stored for
further processing. Stored data are regarded as part of the internal state of the
entity performing the algorithm. In practice, the state is stored in a data structure,
but an algorithm requires the internal data only for specific operation sets called
abstract data types.
For any such computational process, the algorithm must be rigorously defined:
specified in the way it applies in all possible circumstances that could arise. That is,
any conditional steps must be systematically dealt with, case-by-case; the criteria
for each case must be clear (and computable).
120
Because an algorithm is a precise list of precise steps, the order of computation will
almost always be critical to the functioning of the algorithm. Instructions are usually
assumed to be listed explicitly, and are described as starting 'from the top' and going
'down to the bottom', an idea that is described more formally by flow of control.
So far, this discussion of the formalization of an algorithm has assumed the premises
of imperative programming. This is the most common conception, and it attempts to
describe a task in discrete, 'mechanical' means. Unique to this conception of
formalized algorithms is the assignment operation, setting the value of a variable. It
derives from the intuition of 'memory' as a scratchpad. There is an example below of
such an assignment.
For some alternate conceptions of what constitutes an algorithm see functional
programming and logic programming .
Termination
Some writers restrict the definition of algorithm to procedures that eventually finish.
In such a category Kleene places the "decision procedure or decision method or
algorithm for the question". Others, including Kleene, include procedures that could
run forever without stopping; such a procedure has been called a "computational
method" or "calculation procedure or algorithm"; however, Kleene notes that such a
method must eventually exhibit "some object".
Minsky makes the pertinent observation, in regards to determining whether an
algorithm will eventually terminate (from a particular starting state):
"But if the length of the process is not known in advance, then 'trying' it may
not be decisive, because if the process does go on forever — then at no time
will we ever be sure of the answer"
As it happens, no other method can do any better, as was shown by Alan Turing with
his celebrated result on the undecidability of the so-called halting problem. There is
no algorithmic procedure for determining of arbitrary algorithms whether or not they
terminate from given starting states. The analysis of algorithms for their likelihood of
termination is called Termination analysis.
121
In the case of non-halting computation method (calculation procedure) success can
no longer be defined in terms of halting with a meaningful output. Instead, terms of
success that allow for unbounded output sequences must be defined. For example,
an algorithm that verifies if there are more zeros than ones in an infinite random
binary sequence must run forever to be effective. If it is implemented correctly,
however, the algorithm's output will be useful: for as long as it examines the
sequence, the algorithm will give a positive response while the number of examined
zeros outnumber the ones, and a negative response otherwise. Success for this
algorithm could then be defined as eventually outputting only positive responses if
there are actually more zeros than ones in the sequence, and in any other case
outputting any mixture of positive and negative responses.
See the examples of (im-)"proper" subtraction at partial function for more about
what can happen when an algorithm fails for certain of its input numbers — e.g. (i)
non-termination, (ii) production of "junk" (output in the wrong format to be
considered a number) or no number(s) at all (halt ends the computation with no
output), (iii) wrong number(s), or (iv) a combination of these. Kleene proposed that
the production of "junk" or failure to produce a number is solved by having the
algorithm detect these instances and produce e.g. an error message (he suggested
"0"), or preferably, force the algorithm into an endless loop. Davis does this to his
subtraction algorithm — he fixes his algorithm in a second example so that it is
proper subtraction. Along with the logical outcomes "true" and "false" Kleene also
proposes the use of a third logical symbol "u" — undecided — thus an algorithm will
always produce something when confronted with a "proposition". The problem of
wrong answers must be solved with an independent "proof" of the algorithm e.g.
using induction:
"We normally require auxiliary evidence for this (that the algorithm correctly
defines a mu recursive function), e.g. in the form of an inductive proof that,
for each argument value, the computation terminates with a unique value"
Expressing algorithms
Algorithms can be expressed in many kinds of notation, including natural languages,
pseudocode, flowcharts, and programming languages. Natural language expressions
of algorithms tend to be verbose and ambiguous, and are rarely used for complex or
122
technical algorithms. Pseudocode and flowcharts are structured ways to express
algorithms that avoid many of the ambiguities common in natural language
statements, while remaining independent of a particular implementation language.
Programming languages are primarily intended for expressing algorithms in a form
that can be executed by a computer, but are often used as a way to define or
document algorithms.
There is a wide variety of representations possible and one can express a given
Turing machine program as a sequence of machine tables (see more at finite state
machine and state transition table), as flowcharts (see more at state diagram), or as
a form of rudimentary machine code or assembly code called "sets of quadruples"
(see more at Turing machine).
Sometimes it is helpful in the description of an algorithm to supplement small "flow
charts" (state diagrams) with natural-language and/or arithmetic expressions written
inside "block diagrams" to summarize what the "flow charts" are accomplishing.
Representations of algorithms are generally classed into three accepted levels of
Turing machine description:
1 High-level description:
"...prose to describe an algorithm, ignoring the implementation details. At this
level we do not need to mention how the machine manages its tape or head"
2 Implementation description:
"...prose used to define the way the Turing machine uses its head and the
way that it stores data on its tape. At this level we do not give details of
states or transition function"
3 Formal description:
Most detailed, "lowest level", gives the Turing machine's "state table".
123
For an example of the simple algorithm "Add m+n" described in all three
levels
Implementation
Most algorithms are intended to be implemented as computer programs. However,
algorithms are also implemented by other means, such as in a biological neural
network (for example, the human brain implementing arithmetic or an insect looking
for food), in an electrical circuit, or in a mechanical device.
Example
One of the simplest algorithms is to find the largest number in an (unsorted) list of
numbers. The solution necessarily requires looking at every number in the list, but
only once at each. From this follows a simple algorithm, which can be stated in a
high-level description English prose, as:
High-level description:
1. Assume the first item is largest.
2. Look at each of the remaining items in the list and if it is larger than the
largest item so far, make a note of it.
3. The last noted item is the largest in the list when the process is complete.
(Quasi-) Formal description: Written in prose but much closer to the high-level
language of a computer program, the following is the more formal coding of the
algorithm in pseudocode or pidgin code:
Algorithm LargestNumber
Input: A non-empty list of numbers L.
Output: The largest number in the list L.
largest ← L0
for each item in the list L≥1, do
if the item > largest, then
largest ← the item
return largest
124
"←" is a loose shorthand for "changes to". For instance, "largest ← item"
means that the value of largest changes to the value of item.
"return" terminates the algorithm and outputs the value that follows.
For a more complex example of an algorithm, see Euclid's algorithm for the greatest
common divisor, one of the earliest algorithms known.
Algorithm analysis
As it happens, it is important to know how much of a particular resource (such as
time or storage) is required for a given algorithm. Methods have been developed for
the analysis of algorithms to obtain such quantitative answers; for example, the
algorithm above has a time requirement of O(n), using the big O notation with n as
the length of the list. At all times the algorithm only needs to remember two values:
the largest number found so far, and its current position in the input list. Therefore it
is said to have a space requirement of O(1). (Note that the size of the inputs is not
counted as space used by the algorithm.)
Different algorithms may complete the same task with a different set of instructions
in less or more time, space, or effort than others. For example, given two different
recipes for making potato salad, one may have peel the potato before boil the potato
while the other presents the steps in the reverse order, yet they both call for these
steps to be repeated for all potatoes and end when the potato salad is ready to be
eaten.
The analysis and study of algorithms is a discipline of computer science, and is often
practiced abstractly without the use of a specific programming language or
implementation. In this sense, algorithm analysis resembles other mathematical
disciplines in that it focuses on the underlying properties of the algorithm and not on
the specifics of any particular implementation. Usually pseudocode is used for
analysis as it is the simplest and most general representation.
Classes
There are various ways to classify algorithms, each with its own merits.
125
Classification by implementation
One way to classify algorithms is by implementation means.
Recursion or iteration: A recursive algorithm is one that invokes (makes
reference to) itself repeatedly until a certain condition matches, which is a
method common to functional programming. Iterative algorithms use
repetitive constructs like loops and sometimes additional data structures like
stacks to solve the given problems. Some problems are naturally suited for
one implementation or the other. For example, towers of hanoi is well
understood in recursive implementation. Every recursive version has an
equivalent (but possibly more or less complex) iterative version, and vice
versa.
Logical: An algorithm may be viewed as controlled logical deduction. This
notion may be expressed as:
Algorithm = logic + control.
The logic component expresses the axioms that may be used in the
computation and the control component determines the way in which
deduction is applied to the axioms. This is the basis for the logic programming
paradigm. In pure logic programming languages the control component is
fixed and algorithms are specified by supplying only the logic component. The
appeal of this approach is the elegant semantics: a change in the axioms has
a well defined change in the algorithm.
Serial or parallel or distributed: Algorithms are usually discussed with the
assumption that computers execute one instruction of an algorithm at a time.
Those computers are sometimes called serial computers. An algorithm
designed for such an environment is called a serial algorithm, as opposed to
parallel algorithms or distributed algorithms. Parallel algorithms take
advantage of computer architectures where several processors can work on a
problem at the same time, whereas distributed algorithms utilise multiple
machines connected with a network. Parallel or distributed algorithms divide
the problem into more symmetrical or asymmetrical subproblems and collect
the results back together. The resource consumption in such algorithms is not
126
only processor cycles on each processor but also the communication overhead
between the processors. Sorting algorithms can be parallelized efficiently, but
their communication overhead is expensive. Iterative algorithms are generally
parallelizable. Some problems have no parallel algorithms, and are called
inherently serial problems.
Deterministic or non-deterministic: Deterministic algorithms solve the
problem with exact decision at every step of the algorithm whereas non-
deterministic algorithm solve problems via guessing although typical guesses
are made more accurate through the use of heuristics.
Exact or approximate: While many algorithms reach an exact solution,
approximation algorithms seek an approximation that is close to the true
solution. Approximation may use either a deterministic or a random strategy.
Such algorithms have practical value for many hard problems.
Classification by design paradigm
Another way of classifying algorithms is by their design methodology or paradigm.
There is a certain number of paradigms, each different from the other. Furthermore,
each of these categories will include many different types of algorithms. Some
commonly found paradigms include:
Divide and conquer. A divide and conquer algorithm repeatedly reduces an
instance of a problem to one or more smaller instances of the same problem
(usually recursively), until the instances are small enough to solve easily. One
such example of divide and conquer is merge sorting. Sorting can be done on
each segment of data after dividing data into segments and sorting of entire
data can be obtained in conquer phase by merging them. A simpler variant of
divide and conquer is called decrease and conquer algorithm, that solves
an identical subproblem and uses the solution of this subproblem to solve the
bigger problem. Divide and conquer divides the problem into multiple
subproblems and so conquer stage will be more complex than decrease and
conquer algorithms. An example of decrease and conquer algorithm is binary
search algorithm.
Dynamic programming. When a problem shows optimal substructure,
meaning the optimal solution to a problem can be constructed from optimal
127
solutions to subproblems, and overlapping subproblems, meaning the same
subproblems are used to solve many different problem instances, a quicker
approach called dynamic programming avoids recomputing solutions that
have already been computed. For example, the shortest path to a goal from a
vertex in a weighted graph can be found by using the shortest path to the
goal from all adjacent vertices. Dynamic programming and memoization go
together. The main difference between dynamic programming and divide and
conquer is that subproblems are more or less independent in divide and
conquer, whereas subproblems overlap in dynamic programming. The
difference between dynamic programming and straightforward recursion is in
caching or memoization of recursive calls. When subproblems are
independent and there is no repetition, memoization does not help; hence
dynamic programming is not a solution for all complex problems. By using
memoization or maintaining a table of subproblems already solved, dynamic
programming reduces the exponential nature of many problems to polynomial
complexity.
The greedy method. A greedy algorithm is similar to a dynamic
programming algorithm, but the difference is that solutions to the
subproblems do not have to be known at each stage; instead a "greedy"
choice can be made of what looks best for the moment. The greedy method
extends the solution with the best possible decision (not all feasible decisions)
at an algorithmic stage based on the current local optimum and the best
decision (not all possible decisions) made in previous stage. It is not
exhaustive, and does not give accurate answer to many problems. But when
it works, it will be the fastest method. The most popular greedy algorithm is
finding the minimal spanning tree as given by Kruskal.
Linear programming. When solving a problem using linear programming,
specific inequalities involving the inputs are found and then an attempt is
made to maximize (or minimize) some linear function of the inputs. Many
problems (such as the maximum flow for directed graphs) can be stated in a
linear programming way, and then be solved by a 'generic' algorithm such as
the simplex algorithm. A more complex variant of linear programming is
called integer programming, where the solution space is restricted to the
integers.
128
Reduction. This technique involves solving a difficult problem by transforming
it into a better known problem for which we have (hopefully) asymptotically
optimal algorithms. The goal is to find a reducing algorithm whose complexity
is not dominated by the resulting reduced algorithm's. For example, one
selection algorithm for finding the median in an unsorted list involves first
sorting the list (the expensive portion) and then pulling out the middle
element in the sorted list (the cheap portion). This technique is also known as
transform and conquer.
Search and enumeration. Many problems (such as playing chess) can be
modeled as problems on graphs. A graph exploration algorithm specifies rules
for moving around a graph and is useful for such problems. This category also
includes search algorithms, branch and bound enumeration and backtracking.
The probabilistic and heuristic paradigm. Algorithms belonging to this
class fit the definition of an algorithm more loosely.
1. Probabilistic algorithms are those that make some choices randomly (or
pseudo-randomly); for some problems, it can in fact be proven that the
fastest solutions must involve some randomness.
2. Genetic algorithms attempt to find solutions to problems by mimicking
biological evolutionary processes, with a cycle of random mutations yielding
successive generations of "solutions". Thus, they emulate reproduction and
"survival of the fittest". In genetic programming, this approach is extended to
algorithms, by regarding the algorithm itself as a "solution" to a problem.
3. Heuristic algorithms, whose general purpose is not to find an optimal solution,
but an approximate solution where the time or resources are limited. They
are not practical to find perfect solutions. An example of this would be local
search, tabu search, or simulated annealing algorithms, a class of heuristic
probabilistic algorithms that vary the solution of a problem by a random
amount. The name "simulated annealing" alludes to the metallurgic term
meaning the heating and cooling of metal to achieve freedom from defects.
The purpose of the random variance is to find close to globally optimal
solutions rather than simply locally optimal ones, the idea being that the
random element will be decreased as the algorithm settles down to a solution.
Classification by field of study
129
Every field of science has its own problems and needs efficient algorithms. Related
problems in one field are often studied together. Some example classes are search
algorithms, sorting algorithms, merge algorithms, numerical algorithms, graph
algorithms, string algorithms, computational geometric algorithms, combinatorial
algorithms, machine learning, cryptography, data compression algorithms and
parsing techniques.
Fields tend to overlap with each other, and algorithm advances in one field may
improve those of other, sometimes completely unrelated, fields. For example,
dynamic programming was originally invented for optimization of resource
consumption in industry, but is now used in solving a broad range of problems in
many fields.
Classification by complexity
Algorithms can be classified by the amount of time they need to complete compared
to their input size. There is a wide variety: some algorithms complete in linear time
relative to input size, some do so in an exponential amount of time or even worse,
and some never halt. Additionally, some problems may have multiple algorithms of
differing complexity, while other problems might have no algorithms or no known
efficient algorithms. There are also mappings from some problems to other
problems. Owing to this, it was found to be more suitable to classify the problems
themselves instead of the algorithms into equivalence classes based on the
complexity of the best possible algorithms for them.
Legal issues
Algorithms, by themselves, are not usually patentable. In the United States, a claim
consisting solely of simple manipulations of abstract concepts, numbers, or signals
do not constitute "processes" and hence algorithms are not patentable. However,
practical applications of algorithms are sometimes patentable. For example, in
Diamond v. Diehr, the application of a simple feedback algorithm to aid in the curing
of synthetic rubber was deemed patentable. The patenting of software is highly
controversial, and there are highly criticized patents involving algorithms, especially
data compression algorithms, such as Unisys' LZW patent.
History: Development of the notion of "algorithm"
130
Origin of the word
The word algorithm comes from the name of the 9th century Persian mathematician
Abu Abdullah Muhammad ibn Musa al-Khwarizmi whose works introduced Indian
numerals and algebraic concepts. He worked in Baghdad at the time when it was the
centre of scientific studies and trade. The word algorism originally referred only to
the rules of performing arithmetic using Arabic numerals but evolved via European
Latin translation of al-Khwarizmi's name into algorithm by the 18th century. The
word evolved to include all definite procedures for solving problems or performing
tasks.
Discrete and distinguishable symbols
Tally-marks: To keep track of their flocks, their sacks of grain and their money the
ancients used tallying: accumulating stones or marks scratched on sticks, or making
discrete symbols in clay. Through the Babylonian and Egyptian use of marks and
symbols, eventually Roman numerals and the abacus evolved. Tally marks appear
prominently in unary numeral system arithmetic used in Turing machine and Post-
Turing machine computations.
Manipulation of symbols as "place holders" for numbers: algebra
The work of the ancient Greek geometers, Persian mathematician Al-Khwarizmi —
often considered as the "father of algebra", and Western European mathematicians
culminated in Leibniz's notion of the calculus ratiocinator:
"A good century and a half ahead of his time, Leibniz proposed an algebra of
logic, an algebra that would specify the rules for manipulating logical concepts
in the manner that ordinary algebra specifies the rules for manipulating
numbers".
Mechanical contrivances with discrete states
The clock: Bolter credits the invention of the weight-driven clock as “The key
invention [of Europe in the Middle Ages]", in particular the verge escapement that
131
provides us with the tick and tock of a mechanical clock. “The accurate automatic
machine” led immediately to "mechanical automata" beginning in the thirteenth
century and finally to “computational machines" – the difference engine and
analytical engines of Charles Babbage and Countess Ada Lovelace.
Jacquard loom, Hollerith punch cards, telegraphy and telephony — the
electromechanical relay: Bell and Newell indicate that the Jacquard loom,
precursor to Hollerith cards, and “telephone switching technologies” were the roots
of a tree leading to the development of the first computers. By the mid-1800s the
telegraph, the precursor of the telephone, was in use throughout the world, its
discrete and distinguishable encoding of letters as “dots and dashes” a common
sound. By the late 1800s the ticker tape was in use, as was the use of Hollerith cards
in the 1890 U.S. census. Then came the Teletype with its punched-paper use of
Baudot code on tape.
Telephone-switching networks of electromechanical relays (invented 1835) was
behind the work of George Stibitz, the inventor of the digital adding device. As he
worked in Bell Laboratories, he observed the “burdensome’ use of mechanical
calculators with gears. "He went home one evening in 1937 intending to test his
idea.... When the tinkering was over, Stibitz had constructed a binary adding
device".
Davis observes the particular importance of the electromechanical relay (with its two
"binary states" open and closed):
It was only with the development, beginning in the 1930s, of
electromechanical calculators using electrical relays, that machines were built
having the scope Babbage had envisioned."
Mathematics during the 1800s up to the mid-1900s
Symbols and rules: In rapid succession the mathematics of George Boole, Gottlob
Frege, and Giuseppe Peano reduced arithmetic to a sequence of symbols
manipulated by rules. Peano's The principles of arithmetic, presented by a new
132
method was "the first attempt at an axiomatization of mathematics in a symbolic
language".
But Heijenoort gives Frege this kudos: Frege’s is "perhaps the most important single
work ever written in logic. ... in which we see a " 'formula language', that is a lingua
characterica, a language written with special symbols, "for pure thought", that is,
free from rhetorical embellishments ... constructed from specific symbols that are
manipulated according to definite rules". The work of Frege was further simplified
and amplified by Alfred North Whitehead and Bertrand Russell in their Principia
Mathematica.
The paradoxes: At the same time a number of disturbing paradoxes appeared in
the literature, in particular the Burali-Forti paradox, the Russell paradox, and the
Richard Paradox. The resultant considerations led to Kurt Gödel’s paper — he
specifically cites the paradox of the liar — that completely reduces rules of recursion
to numbers.
Effective calculability: In an effort to solve the Entscheidungsproblem defined
precisely by Hilbert in 1928, mathematicians first set about to define what was
meant by an "effective method" or "effective calculation" or "effective calculability"
(i.e. a calculation that would succeed). In rapid succession the following appeared:
Alonzo Church, Stephen Kleene and J.B. Rosser's λ-calculus, a finely-honed definition
of "general recursion" from the work of Gödel acting on suggestions of Jacques
Herbrand and subsequent simplifications by Kleene, Church's proof that the
Entscheidungsproblem was unsolvable, Emil Post's definition of effective calculability
as a worker mindlessly following a list of instructions to move left or right through a
sequence of rooms and while there either mark or erase a paper or observe the
paper and make a yes-no decision about the next instruction, Alan Turing's proof of
that the Entscheidungsproblem was unsolvable by use of his "a- [automatic-]
machine" -- in effect almost identical to Post's "formulation", J. Barkley Rosser's
definition of "effective method" in terms of "a machine", S. C. Kleene's proposal of a
precursor to "Church thesis" that he called "Thesis I", and a few years later Kleene's
renaming his Thesis "Church's Thesis" and proposing "Turing's Thesis".
Emil Post and Alan Turing
133
Here is a remarkable coincidence of two men not knowing each other but describing
a process of men-as-computers working on computations — and they yield virtually
identical definitions.
Emil Post described the actions of a "computer" (human being) as follows:
"...two concepts are involved: that of a symbol space in which the work
leading from problem to answer is to be carried out, and a fixed unalterable
set of directions.
His symbol space would be
"a two way infinite sequence of spaces or boxes... The problem solver or
worker is to move and work in this symbol space, being capable of being in,
and operating in but one box at a time.... a box is to admit of but two
possible conditions, i.e. being empty or unmarked, and having a single mark
in it, say a vertical stroke.
"One box is to be singled out and called the starting point. ...a specific
problem is to be given in symbolic form by a finite number of boxes [i.e.
INPUT] being marked with a stroke. Likewise the answer [i.e. OUTPUT] is to
be given in symbolic form by such a configuration of marked boxes....
"A set of directions applicable to a general problem sets up a deterministic
process when applied to each specific problem. This process will terminate
only when it comes to the direction of type (C ) [i.e. STOP]."
Alan Turing’s work preceded that of Stibitz; it is unknown if Stibitz knew of the work
of Turing. Turing’s biographer believed that Turing’s use of a typewriter-like model
derived from a youthful interest: “ Alan had dreamt of inventing typewriters as a
boy; Mrs. Turing had a typewriter; and he could well have begun by asking himself
what was meant by calling a typewriter 'mechanical' ”. Given the prevalence of Morse
code and telegraphy, ticker tape machines, and Teletypes we might conjecture that
all were influences.
Turing — his model of computation is now called a Turing machine — begins, as did
Post, with an analysis of a human computer that he whittles down to a simple set of
basic motions and "states of mind". But he continues a step further and creates a
machine as a model of computation of numbers:
134
"Computing is normally done by writing certain symbols on paper. We may
suppose this paper is divided into squares like a child's arithmetic book....I
assume then that the computation is carried out on one-dimensional paper,
i.e. on a tape divided into squares. I shall also suppose that the number of
symbols which may be printed is finite....
"The behavior of the computer at any moment is determined by the symbols
which he is observing, and his "state of mind" at that moment. We may
suppose that there is a bound B to the number of symbols or squares which
the computer can observe at one moment. If he wishes to observe more, he
must use successive observations. We will also suppose that the number of
states of mind which need be taken into account is finite...
"Let us imagine that the operations performed by the computer to be split up
into 'simple operations' which are so elementary that it is not easy to imagine
them further divided".
Turing's reduction yields the following:
"The simple operations must therefore include:
"(a) Changes of the symbol on one of the observed squares
"(b) Changes of one of the squares observed to another square within L
squares of one of the previously observed squares.
"It may be that some of these change necessarily invoke a change of state of mind.
The most general single operation must therefore be taken to be one of the
following:
"(A) A possible change (a) of symbol together with a possible change of state
of mind.
"(B) A possible change (b) of observed squares, together with a possible
change of state of mind"
"We may now construct a machine to do the work of this computer."
A few years later, Turing expanded his analysis (thesis, definition) with this forceful
expression of it:
"A function is said to be "effectivey calculable" if its values can be found by
some purely mechanical process. Although it is fairly easy to get an intuitive
135
grasp of this idea, it is neverthessless desirable to have some more definite,
mathematical expressible definition . . . [he discusses the history of the
definition pretty much as presented above with respect to Gödel, Herbrand,
Kleene, Church, Turing and Post] . . . We may take this statement literally,
understanding by a purely mechanical process one which could be carried out
by a machine. It is possible to give a mathematical description, in a certain
normal form, of the structures of these machines. The development of these
ideas leads to the author's definition of a computable function, and to an
identification of computability † with effective calculability . . . .
"† We shall use the expression "computable function" to mean a function
calculable by a machine, and we let "effectively calculabile" refer to the
intuitive idea without particular identification with any one of these
definitions".
J. B. Rosser and S. C. Kleene
J. Barkley Rosser boldly defined an ‘effective [mathematical] method’ in the following
manner (boldface added):
"'Effective method' is used here in the rather special sense of a method each
step of which is precisely determined and which is certain to produce the
answer in a finite number of steps. With this special meaning, three different
precise definitions have been given to date. The simplest of these to state
(due to Post and Turing) says essentially that an effective method of
solving certain sets of problems exists if one can build a machine
which will then solve any problem of the set with no human
intervention beyond inserting the question and (later) reading the
answer. All three definitions are equivalent, so it doesn't matter which one is
used. Moreover, the fact that all three are equivalent is a very strong
argument for the correctness of any one."
Rosser's footnote #5 references the work of (1) Church and Kleene and their
definition of λ-definability, in particular Church's use of it in his An Unsolvable
Problem of Elementary Number Theory; (2) Herbrand and Gödel and their use of
recursion in particular Gödel's use in his famous paper On Formally Undecidable
136
Propositions of Principia Mathematica and Related Systems I; and (3) Post and
Turing in their mechanism-models of computation.
Stephen C. Kleene defined as his now-famous "Thesis I" known as "the Church-
Turing Thesis". But he did this in the following context (boldface in original):
"12. Algorithmic theories... In setting up a complete algorithmic theory,
what we do is to describe a procedure, performable for each set of values of
the independent variables, which procedure necessarily terminates and in
such manner that from the outcome we can read a definite answer, "yes" or
"no," to the question, "is the predicate value true?”"
History after 1950
A number of efforts have been directed toward further refinement of the definition of
"algorithm", and activity is on-going because of issues surrounding, in particular,
foundations of mathematics (especially the Church-Turing Thesis) and philosophy of
mind (especially arguments around artificial intelligence). For more, see Algorithm
characterizations.
137
Compiler
A diagram of the operation of a typical multi-language, multi-target compiler.
A compiler is a computer program (or set of programs) that translates text written
in a computer language (the source language) into another computer language (the
target language). The original sequence is usually called the source code and the
output called object code. Commonly the output has a form suitable for processing
by other programs (e.g., a linker), but it may be a human-readable text file.
The most common reason for wanting to translate source code is to create an
executable program. The name "compiler" is primarily used for programs that
translate source code from a high-level programming language to a lower level
language (e.g., assembly language or machine language). A program that translates
from a low level language to a higher level one is a decompiler. A program that
translates between high-level languages is usually called a language translator,
138
source to source translator, or language converter. A language rewriter is usually a
program that translates the form of expressions without a change of language.
A compiler is likely to perform many or all of the following operations: lexical
analysis, preprocessing, parsing, semantic analysis, code generation, and code
optimization.
History
Software for early computers was exclusively written in assembly language for many
years. Higher level programming languages were not invented until the benefits of
being able to reuse software on different kinds of CPUs started to become
significantly greater than the cost of writing a compiler. The very limited memory
capacity of early computers also created many technical problems when
implementing a compiler.
Towards the end of the 1950s, machine-independent programming languages were
first proposed. Subsequently, several experimental compilers were developed. The
first compiler was written by Grace Hopper, in 1952, for the A-0 programming
language. The FORTRAN team led by John Backus at IBM is generally credited as
having introduced the first complete compiler, in 1957. COBOL was an early
language to be compiled on multiple architectures, in 1960.
In many application domains the idea of using a higher level language quickly caught
on. Because of the expanding functionality supported by newer programming
languages and the increasing complexity of computer architectures, compilers have
become more and more complex.
Early compilers were written in assembly language. The first self-hosting compiler —
capable of compiling its own source code in a high-level language — was created for
Lisp by Hart and Levin at MIT in 1962. Since the 1970s it has become common
practice to implement a compiler in the language it compiles, although both Pascal
and C have been popular choices for implementation language. Building a self-
hosting compiler is a bootstrapping problem -- the first such compiler for a language
must be compiled either by a compiler written in a different language, or (as in Hart
and Levin's Lisp compiler) compiled by running the compiler in an interpreter.
139
Compilers in education
Compiler construction and compiler optimization are taught at universities as part of
the computer science curriculum. Such courses are usually supplemented with the
implementation of a compiler for an educational programming language. A well-
documented example is Niklaus Wirth's PL/0 compiler, which Wirth used to teach
compiler construction in the 1970s. In spite of its simplicity, the PL/0 compiler
introduced several influential concepts to the field:
1. Program development by stepwise refinement
2. The use of a recursive descent parser
3. The use of EBNF to specify the syntax of a language
4. A code generator producing portable P-code
5. The use of T-diagrams in the formal description of the bootstrapping problem
Compiler output
One method used to classify compilers is by the platform on which the generated
code they produce executes. This is known as the target platform.
A native or hosted compiler is one whose output is intended to directly run on the
same type of computer and operating system as the compiler itself runs on. The
output of a cross compiler is designed to run on a different platform. Cross compilers
are often used when developing software for embedded systems that are not
intended to support a software development environment.
The output of a compiler that produces code for a virtual machine (VM) may or may
not be executed on the same platform as the compiler that produced it. For this
reason such compilers are not usually classified as native or cross compilers.
Compiled versus interpreted languages
Higher-level programming languages are generally divided for convenience into
compiled languages and interpreted languages. However, there is rarely anything
about a language that requires it to be exclusively compiled, or exclusively
interpreted. The categorization usually reflects the most popular or widespread
implementations of a language — for instance, BASIC is thought of as an interpreted
140
language, and C a compiled one, despite the existence of BASIC compilers and C
interpreters.
In a sense, all languages are interpreted, with "execution" being merely a special
case of interpretation performed by transistors switching on a CPU. Modern trends
toward just-in-time compilation and bytecode interpretation also blur the traditional
categorizations.
There are exceptions. Some language specifications spell out that implementations
must include a compilation facility; for example, Common Lisp. Other languages
have features that are very easy to implement in an interpreter, but make writing a
compiler much harder; for example, APL, SNOBOL4, and many scripting languages
allow programs to construct arbitrary source code at runtime with regular string
operations, and then execute that code by passing it to a special evaluation function.
To implement these features in a compiled language, programs must usually be
shipped with a runtime library that includes a version of the compiler itself.
Hardware compilation
The output of some compilers may target hardware at a very low level. For example
a Field Programmable Gate Array (FPGA) or structured Application-specific integrated
circuit (ASIC). Such compilers are said to be hardware compilers or synthesis tools
because the programs they compile effectively control the final configuration of the
hardware and how it operates; there are no instructions that are executed in
sequence - only an interconnection of transistors or lookup tables. For example, XST
is the Xilinx Synthesis Tool used for configuring FPGAs. Similar tools are available
from Altera, Synplicity, Synopsys and other vendors.
Compiler design
The approach taken to compiler design is affected by the complexity of the
processing that needs to be done, the experience of the person(s) designing it, and
the resources (e.g., people and tools) available.
A compiler for a relatively simple language written by one person might be a single,
monolithic piece of software. When the source language is large and complex, and
high quality output is required the design may be split into a number of relatively
141
independent phases, or passes. Having separate phases means development can be
parceled up into small parts and given to different people. It also becomes much
easier to replace a single phase by an improved one, or to insert new phases later
(e.g., additional optimizations).
The division of the compilation processes in phases (or passes) was championed by
the Production Quality Compiler-Compiler Project (PQCC) at Carnegie Mellon
University. This project introduced the terms front end, middle end (rarely heard
today), and back end.
All but the smallest of compilers have more than two phases. However, these phases
are usually regarded as being part of the front end or the back end. The point at
where these two ends meet is always open to debate. The front end is generally
considered to be where syntactic and semantic processing takes place, along with
translation to a lower level of representation (than source code).
The middle end is usually designed to perform optimizations on a form other than
the source code or machine code. This source code/machine code independence is
intended to enable generic optimizations to be shared between versions of the
compiler supporting different languages and target processors.
The back end takes the output from the middle. It may perform more analysis,
transformations and optimizations that are for a particular computer. Then, it
generates code for a particular processor and OS.
This front-end/middle/back-end approach makes it possible to combine front ends
for different languages with back ends for different CPUs. Practical examples of this
approach are the GNU Compiler Collection, LLVM, and the Amsterdam Compiler Kit,
which have multiple front-ends, shared analysis and multiple back-ends.
One-pass versus multi-pass compilers
Classifying compilers by number of passes has its background in the hardware
resource limitations of computers. Compiling involves performing lots of work and
early computers did not have enough memory to contain one program that did all of
this work. So compilers were split up into smaller programs which each made a pass
142
over the source (or some representation of it) performing some of the required
analysis and translations.
The ability to compile in a single pass is often seen as a benefit because it simplifies
the job of writing a compiler and one pass compilers are generally faster than multi-
pass compilers. Many languages were designed so that they could be compiled in a
single pass (e.g., Pascal).
In some cases the design of a language feature may require a compiler to perform
more than one pass over the source. For instance, consider a declaration appearing
on line 20 of the source which affects the translation of a statement appearing on
line 10. In this case, the first pass needs to gather information about declarations
appearing after statements that they affect, with the actual translation happening
during a subsequent pass.
The disadvantage of compiling in a single pass is that it is not possible to perform
many of the sophisticated optimizations needed to generate high quality code. It can
be difficult to count exactly how many passes an optimizing compiler makes. For
instance, different phases of optimization may analyze one expression many times
but only analyze another expression once.
Splitting a compiler up into small programs is a technique used by researchers
interested in producing provably correct compilers. Proving the correctness of a set
of small programs often requires less effort than proving the correctness of a larger,
single, equivalent program.
While the typical multi-pass compiler outputs machine code from its final pass, there
are several other types:
A "source-to-source compiler" is a type of compiler that takes a high level
language as its input and outputs a high level language. For example, an
automatic parallelizing compiler will frequently take in a high level language
program as an input and then transform the code and annotate it with parallel
code annotations (e.g. OpenMP) or language constructs (e.g. Fortran's DOALL
statements).
Stage compiler that compiles to assembly language of a theoretical machine,
like some Prolog implementations
143
o This Prolog machine is also known as the Warren Abstract Machine (or
WAM). Bytecode compilers for Java, Python, and many more are also a
subtype of this.
Just-in-time compiler, used by Smalltalk and Java systems, and also by
Microsoft .Net's Common Intermediate Language (CIL)
o Applications are delivered in bytecode, which is compiled to native
machine code just prior to execution.
Front end
The front end analyzes the source code to build an internal representation of the
program, called the intermediate representation or IR. It also manages the symbol
table, a data structure mapping each symbol in the source code to associated
information such as location, type and scope. This is done over several phases,
which includes some of the following:
1. Line reconstruction. Languages which strop their keywords or allow
arbitrary spaces within identifiers require a phase before parsing, which
converts the input character sequence to a canonical form ready for the
parser. The top-down, recursive-descent, table-driven parsers used in the
1960s typically read the source one character at a time and did not require a
separate tokenizing phase. Atlas Autocode, and Imp (and some
implementations of Algol and Coral66) are examples of stropped languages
whose compilers would have a Line Reconstruction phase.
2. Lexical analysis breaks the source code text into small pieces called tokens.
Each token is a single atomic unit of the language, for instance a keyword,
identifier or symbol name. The token syntax is typically a regular language,
so a finite state automaton constructed from a regular expression can be used
to recognize it. This phase is also called lexing or scanning, and the software
doing lexical analysis is called a lexical analyzer or scanner.
3. Preprocessing. Some languages, e.g., C, require a preprocessing phase which
supports macro substitution and conditional compilation. Typically the
preprocessing phase occurs before syntactic or semantic analysis; e.g. in the
case of C, the preprocessor manipulates lexical tokens rather than syntactic
forms. However, some languages such as Scheme support macro
substitutions based on syntactic forms.
144
4. Syntax analysis involves parsing the token sequence to identify the syntactic
structure of the program. This phase typically builds a parse tree, which
replaces the linear sequence of tokens with a tree structure built according to
the rules of a formal grammar which define the language's syntax. The parse
tree is often analyzed, augmented, and transformed by later phases in the
compiler.
5. Semantic analysis is the phase in which the compiler adds semantic
information to the parse tree and builds the symbol table. This phase
performs semantic checks such as type checking (checking for type errors),
or object binding (associating variable and function references with their
definitions), or definite assignment (requiring all local variables to be
initialized before use), rejecting incorrect programs or issuing warnings.
Semantic analysis usually requires a complete parse tree, meaning that this
phase logically follows the parsing phase, and logically precedes the code
generation phase, though it is often possible to fold multiple phases into one
pass over the code in a compiler implementation.
Back end
The term back end is sometimes confused with code generator because of the
overlapped functionality of generating assembly code. Some literature uses middle
end to distinguish the generic analysis and optimization phases in the back end from
the machine-dependent code generators.
The main phases of the back end include the following:
1. Analysis: This is the gathering of program information from the intermediate
representation derived from the input. Typical analyses are data flow analysis
to build use-define chains, dependence analysis, alias analysis, pointer
analysis, escape analysis etc. Accurate analysis is the basis for any compiler
optimization. The call graph and control flow graph are usually also built
during the analysis phase.
2. Optimization: the intermediate language representation is transformed into
functionally equivalent but faster (or smaller) forms. Popular optimizations
are inline expansion, dead code elimination, constant propagation, loop
transformation, register allocation or even automatic parallelization.
145
3. Code generation: the transformed intermediate language is translated into
the output language, usually the native machine language of the system. This
involves resource and storage decisions, such as deciding which variables to
fit into registers and memory and the selection and scheduling of appropriate
machine instructions along with their associated addressing modes (see also
Sethi-Ullman algorithm).
Compiler analysis is the prerequisite for any compiler optimization, and they tightly
work together. For example, dependence analysis is crucial for loop transformation.
In addition, the scope of compiler analysis and optimizations vary greatly, from as
small as a basic block to the procedure/function level, or even over the whole
program (interprocedural optimization). Obviously, a compiler can potentially do a
better job using a broader view. But that broad view is not free: large scope analysis
and optimizations are very costly in terms of compilation time and memory space;
this is especially true for interprocedural analysis and optimizations.
The existence of interprocedural analysis and optimizations is common in modern
commercial compilers from HP, IBM, SGI, Intel, Microsoft, and Sun Microsystems.
The open source GCC was criticized for a long time for lacking powerful
interprocedural optimizations, but it is changing in this respect. Another good open
source compiler with full analysis and optimization infrastructure is Open64, which is
used by many organizations for research and commercial purposes.
Due to the extra time and space needed for compiler analysis and optimizations,
some compilers skip them by default. Users have to use compilation options to
explicitly tell the compiler which optimizations should be enabled.
Related techniques
Assembly language is not a high-level language and a program that compiles it is
more commonly known as an assembler, with the inverse program known as a
disassembler.
A program that translates from a low level language to a higher level one is a
decompiler.
146
A program that translates between high-level languages is usually called a language
translator, source to source translator, language converter, or language rewriter. The
last term is usually applied to translations that do not involve a change of language.
147
Network topology
Diagram of different network topologies.
Network topology is the study of the arrangement or mapping of the elements
(links, nodes, etc.) of a network, especially the physical (real) and logical (virtual)
interconnections between nodes.
A local area network (LAN) is one example of a network that exhibits both a physical
topology and a logical topology. Any given node in the LAN will have one or more
links to one or more other nodes in the network and the mapping of these links and
nodes onto a graph results in a geometrical shape that determines the physical
topology of the network. Likewise, the mapping of the flow of data between the
nodes in the network determines the logical topology of the network. It is important
to note that the physical and logical topologies might be identical in any particular
network but they also may be different.
Any particular network topology is determined only by the graphical mapping of the
configuration of physical and/or logical connections between nodes - Network
Topology is, therefore, technically a part of graph theory. Distances between nodes,
physical interconnections, transmission rates, and/or signal types may differ in two
networks and yet their topologies may be identical.
148
Basic types of topologies
The arrangement or mapping of the elements of a network gives rise to certain basic
topologies which may then be combined to form more complex topologies (hybrid
topologies). The most common of these basic types of topologies are (refer to the
illustration at the top right of this page):
Bus (Linear, Linear Bus)
Star
Ring
Mesh
o partially connected mesh (or simply 'mesh')
o fully connected mesh
Tree
Hybrid
Classification of network topologies
There are also three basic categories of network topologies:
physical topologies
signal topologies
logical topologies
The terms signal topology and logical topology are often used interchangeably even
though there is a subtle difference between the two and the distinction is not often
made between the two.
Physical topologies
The mapping of the nodes of a network and the physical connections between them
– i.e., the layout of wiring, cables, the locations of nodes, and the interconnections
between the nodes and the cabling or wiring system.
149
Classification of physical topologies
Point-to-point
The simplest topology is a permanent link between two endpoints. Switched point-to-
point topologies are the basic model of conventional telephony. The value of a
permanent point-to-point network is the value of guaranteed, or nearly so,
communications between the two endpoints. The value of an on-demand point-to-
point connection is proportional to the number of potential pairs of subscribers, and
has been expressed as Metcalfe's Law.
Permanent (dedicated)
Easiest to understand, of the variations of point-to-point topology, is a point-to-point
communications channel that appears, to the user, to be permanently associated
with the two endpoints. Children's "tin-can telephone" is one example, with a
microphone to a single public address speaker is another. These are examples of
physical dedicated channels. Within many switched telecommunications systems, it is
possible to establish a permanent circuit. One example might be a telephone in the
lobby of a public building, which is programmed to ring only the number of a
telephone dispatcher. "Nailing down" a switched connection saves the cost of running
a physical circuit between the two points. The resources in such a connection can be
released when no longer needed, as, for example, a television circuit from a parade
route back to the studio.
Switched
Using circuit-switching or packet-switching technologies, a point-to-point circuit can
be set up dynamically, and dropped when no longer needed. This is the basic mode
of conventional telephony.
Bus
Linear bus
The type of network topology in which all of the nodes of the network are
connected to a common transmission medium which has exactly two
150
endpoints (this is the 'bus', which is also commonly referred to as the
backbone, or trunk) – all data that is transmitted between nodes in the
network is transmitted over this common transmission medium and is able to
be received by all nodes in the network virtually simultaneously (disregarding
propagation delays).
Note: The two endpoints of the common transmission medium are normally
terminated with a device called a terminator that exhibits the characteristic
impedance of the transmission medium and which dissipates or absorbs the
energy that remains in the signal to prevent the signal from being reflected or
propagated back onto the transmission medium in the opposite direction,
which would cause interference with and degradation of the signals on the
transmission medium (See Electrical termination).
Distributed bus
The type of network topology in which all of the nodes of the network are
connected to a common transmission medium which has more than two
endpoints that are created by adding branches to the main section of the
transmission medium – the physical distributed bus topology functions in
exactly the same fashion as the physical linear bus topology (i.e., all nodes
share a common transmission medium).
Notes:
1.) All of the endpoints of the common transmission medium are normally
terminated with a device called a 'terminator' (see the note under linear bus).
2.) The physical linear bus topology is sometimes considered to be a special
case of the physical distributed bus topology – i.e., a distributed bus with no
branching segments.
3.) The physical distributed bus topology is sometimes incorrectly referred to
as a physical tree topology – however, although the physical distributed bus
topology resembles the physical tree topology, it differs from the physical tree
topology in that there is no central node to which any other nodes are
connected, since this hierarchical functionality is replaced by the common
bus.
151
Star
The type of network topology in which each of the nodes of the network is
connected to a central node with a point-to-point link in a 'hub' and 'spoke'
fashion, the central node being the 'hub' and the nodes that are attached to
the central node being the 'spokes' (e.g., a collection of point-to-point links
from the peripheral nodes that converge at a central node) – all data that is
transmitted between nodes in the network is transmitted to this central node,
which is usually some type of device that then retransmits the data to some
or all of the other nodes in the network, although the central node may also
be a simple common connection point (such as a 'punch-down' block) without
any active device to repeat the signals.
Notes:
1.) A point-to-point link (described above) is sometimes categorized as a
special instance of the physical star topology – therefore, the simplest type of
network that is based upon the physical star topology would consist of one
node with a single point-to-point link to a second node, the choice of which
node is the 'hub' and which node is the 'spoke' being arbitrary[1].
2.) After the special case of the point-to-point link, as in note 1.) above, the
next simplest type of network that is based upon the physical star topology
would consist of one central node – the 'hub' – with two separate point-to-
point links to two peripheral nodes – the 'spokes'.
3.) Although most networks that are based upon the physical star topology
are commonly implemented using a special device such as a hub or switch as
the central node (i.e., the 'hub' of the star), it is also possible to implement a
network that is based upon the physical star topology using a computer or
even a simple common connection point as the 'hub' or central node –
however, since many illustrations of the physical star network topology depict
the central node as one of these special devices, some confusion is possible,
since this practice may lead to the misconception that a physical star network
requires the central node to be one of these special devices, which is not true
because a simple network consisting of three computers connected as in note
2.) above also has the topology of the physical star.
4.) Star networks may also be described as either broadcast multi-access or
nonbroadcast multi-access (NBMA), depending on whether the technology of
152
the network either automatically propagates a signal at the hub to all spokes,
or only addresses individual spokes with each communication.
Extended star
A type of network topology in which a network that is based upon the physical
star topology has one or more repeaters between the central node (the 'hub'
of the star) and the peripheral or 'spoke' nodes, the repeaters being used to
extend the maximum transmission distance of the point-to-point links
between the central node and the peripheral nodes beyond that which is
supported by the transmitter power of the central node or beyond that which
is supported by the standard upon which the physical layer of the physical
star network is based.
Note: If the repeaters in a network that is based upon the physical extended
star topology are replaced with hubs or switches, then a hybrid network
topology is created that is referred to as a physical hierarchical star topology,
although some texts make no distinction between the two topologies.
Distributed Star
A type of network topology that is composed of individual networks that are
based upon the physical star topology connected together in a linear fashion –
i.e., 'daisy-chained' – with no central or top level connection point (e.g., two
or more 'stacked' hubs, along with their associated star connected nodes or
'spokes').
Ring
The type of network topology in which each of the nodes of the network is
connected to two other nodes in the network and with the first and last nodes
being connected to each other, forming a ring – all data that is transmitted
between nodes in the network travels from one node to the next node in a
circular manner and the data generally flows in a single direction only.
153
Dual-ring
The type of network topology in which each of the nodes of the network is
connected to two other nodes in the network, with two connections to each of
these nodes, and with the first and last nodes being connected to each other
with two connections, forming a double ring – the data flows in opposite
directions around the two rings, although, generally, only one of the rings
carries data during normal operation, and the two rings are independent
unless there is a failure or break in one of the rings, at which time the two
rings are joined (by the stations on either side of the fault) to enable the flow
of data to continue using a segment of the second ring to bypass the fault in
the primary ring.
Mesh
The value of fully meshed networks is proportional to the exponent of the
number of subscribers, assuming that communicating groups of any two
endpoints, up to and including all the endpoints, is approximated by Reed's
Law.
Full
Fully connected
The type of network topology in which each of the nodes of the network is
connected to each of the other nodes in the network with a point-to-point link
– this makes it possible for data to be simultaneously transmitted from any
single node to all of the other nodes.
Note: The physical fully connected mesh topology is generally too costly and
complex for practical networks, although the topology is used when there are
only a small number of nodes to be interconnected.
Partial
Partially connected
The type of network topology in which some of the nodes of the network are
connected to more than one other node in the network with a point-to-point
link – this makes it possible to take advantage of some of the redundancy
that is provided by a physical fully connected mesh topology without the
154
expense and complexity required for a connection between every node in the
network.
Note: In most practical networks that are based upon the physical partially
connected mesh topology, all of the data that is transmitted between nodes in
the network takes the shortest path (or an approximation of the shortest
path) between nodes, except in the case of a failure or break in one of the
links, in which case the data takes an alternate path to the destination. This
requires that the nodes of the network possess some type of logical 'routing'
algorithm to determine the correct path to use at any particular time.
Tree (also known as hierarchical):
The type of network topology in which a central 'root' node (the top level of
the hierarchy) is connected to one or more other nodes that are one level
lower in the hierarchy (i.e., the second level) with a point-to-point link
between each of the second level nodes and the top level central 'root' node,
while each of the second level nodes that are connected to the top level
central 'root' node will also have one or more other nodes that are one level
lower in the hierarchy (i.e., the third level) connected to it, also with a point-
to-point link, the top level central 'root' node being the only node that has no
other node above it in the hierarchy – the hierarchy of the tree is
symmetrical, each node in the network having a specific fixed number, f, of
nodes connected to it at the next lower level in the hierarchy, the number, f,
being referred to as the 'branching factor' of the hierarchical tree.
Notes:
1.) A network that is based upon the physical hierarchical topology must have
at least three levels in the hierarchy of the tree, since a network with a
central 'root' node and only one hierarchical level below it would exhibit the
physical topology of a star.
2.) A network that is based upon the physical hierarchical topology and with a
branching factor of 1 would be classified as a physical linear topology.
155
3.) The branching factor, f, is independent of the total number of nodes in the
network and, therefore, if the nodes in the network require ports for
connection to other nodes the total number of ports per node may be kept
low even though the total number of nodes is large – this makes the effect of
the cost of adding ports to each node totally dependent upon the branching
factor and may therefore be kept as low as required without any effect upon
the total number of nodes that are possible.
4.) The total number of point-to-point links in a network that is based upon
the physical hierarchical topology will be one less than the total number of
nodes in the network.
5.) If the nodes in a network that is based upon the physical hierarchical
topology are required to perform any processing upon the data that is
transmitted between nodes in the network, the nodes that are at higher levels
in the hierarchy will be required to perform more processing operations on
behalf of other nodes than the nodes that are lower in the hierarchy.
Hybrid network topologies
The hybrid topology is a type of network topology that is composed of one or
more interconnections of two or more networks that are based upon different
physical topologies or a type of network topology that is composed of one or
more interconnections of two or more networks that are based upon the same
physical topology, but where the physical topology of the network resulting
from such an interconnection does not meet the definition of the original
physical topology of the interconnected networks (e.g., the physical topology
of a network that would result from an interconnection of two or more
networks that are based upon the physical star topology might create a
hybrid topology which resembles a mixture of the physical star and physical
bus topologies or a mixture of the physical star and the physical tree
topologies, depending upon how the individual networks are interconnected,
while the physical topology of a network that would result from an
interconnection of two or more networks that are based upon the physical
distributed bus network retains the topology of a physical distributed bus
network).
156
Star-bus
A type of network topology in which the central nodes of one or more
individual networks that are based upon the physical star topology are
connected together using a common 'bus' network whose physical topology is
based upon the physical linear bus topology, the endpoints of the common
'bus' being terminated with the characteristic impedance of the transmission
medium where required – e.g., two or more hubs connected to a common
backbone with drop cables through the port on the hub that is provided for
that purpose (e.g., a properly configured 'uplink' port) would comprise the
physical bus portion of the physical star-bus topology, while each of the
individual hubs, combined with the individual nodes which are connected to
them, would comprise the physical star portion of the physical star-bus
topology.
Star-of-stars
Hierarchical star
A type of network topology that is composed of an interconnection of
individual networks that are based upon the physical star topology connected
together in a hierarchical fashion to form a more complex network – e.g., a
top level central node which is the 'hub' of the top level physical star topology
and to which other second level central nodes are attached as the 'spoke'
nodes, each of which, in turn, may also become the central nodes of a third
level physical star topology.
Notes:
1.) The physical hierarchical star topology is not a combination of the physical
linear bus and the physical star topologies, as cited in some texts, as there is
no common linear bus within the topology, although the top level 'hub' which
is the beginning of the physical hierarchical star topology may be connected
to the backbone of another network, such as a common carrier, which is,
topologically, not considered to be a part of the local network – if the top level
central node is connected to a backbone that is considered to be a part of the
local network, then the resulting network topology would be considered to be
a hybrid topology that is a mixture of the topology of the backbone network
and the physical hierarchical star topology.
2.) The physical hierarchical star topology is also sometimes incorrectly
referred to as a physical tree topology, since its physical topology is
157
hierarchical, however, the physical hierarchical star topology does not have a
structure that is determined by a branching factor, as is the case with the
physical tree topology and, therefore, nodes may be added to, or removed
from, any node that is the 'hub' of one of the individual physical star topology
networks within a network that is based upon the physical hierarchical star
topology.
3.) The physical hierarchical star topology is commonly used in 'outside plant'
(OSP) cabling to connect various buildings to a central connection facility,
which may also house the 'demarcation point' for the connection to the data
transmission facilities of a common carrier, and in 'inside plant' (ISP) cabling
to connect multiple wiring closets within a building to a common wiring closet
within the same building, which is also generally where the main backbone or
trunk that connects to a larger network, if any, enters the building.
Star-wired ring
A type of hybrid physical network topology that is a combination of the
physical star topology and the physical ring topology, the physical star portion
of the topology consisting of a network in which each of the nodes of which
the network is composed are connected to a central node with a point-to-
point link in a 'hub' and 'spoke' fashion, the central node being the 'hub' and
the nodes that are attached to the central node being the 'spokes' (e.g., a
collection of point-to-point links from the peripheral nodes that converge at a
central node) in a fashion that is identical to the physical star topology, while
the physical ring portion of the topology consists of circuitry within the central
node which routes the signals on the network to each of the connected nodes
sequentially, in a circular fashion.
Note: In an 802.5 Token Ring network the central node is called a
Multistation Access Unit (MAU).
Hybrid mesh
A type of hybrid physical network topology that is a combination of the
physical partially connected topology and one or more other physical
topologies the mesh portion of the topology consisting of redundant or
alternate connections between some of the nodes in the network – the
physical hybrid mesh topology is commonly used in networks which require a
high degree of availability..
158
Signal topology
The mapping of the actual connections between the nodes of a network, as
evidenced by the path that the signals take when propagating between the nodes.
Note: The term 'signal topology' is often used synonymously with the term
'logical topology', however, some confusion may result from this practice in
certain situations since, by definition, the term 'logical topology' refers to the
apparent path that the data takes between nodes in a network while the term
'signal topology' generally refers to the actual path that the signals (e.g.,
optical, electrical, electromagnetic, etc.) take when propagating between
nodes.
Example
In an 802.4 Token Bus network, the physical topology may be a physical bus,
a physical star, or a hybrid physical topology, while the signal topology is a
bus (i.e., the electrical signal propagates to all nodes simultaneously [ignoring
propagation delays and network latency] ), and the logical topology is a ring
(i.e., the data flows from one node to the next in a circular manner according
to the protocol).
Logical topology
The mapping of the apparent connections between the nodes of a network, as
evidenced by the path that data appears to take when traveling between the nodes.
Classification of logical topologies
The logical classification of network topologies generally follows the same
classifications as those in the physical classifications of network topologies, the path
that the data takes between nodes being used to determine the topology as opposed
to the actual physical connections being used to determine the topology.
Notes:
1.) Logical topologies are often closely associated with media access control
(MAC) methods and protocols.
2.) The logical topologies are generally determined by network protocols as
opposed to being determined by the physical layout of cables, wires, and
159
network devices or by the flow of the electrical signals, although in many
cases the paths that the electrical signals take between nodes may closely
match the logical flow of data, hence the convention of using the terms
'logical topology' and 'signal topology' interchangeably.
3.) Logical topologies are able to be dynamically reconfigured by special types
of equipment such as routers and switches.
Daisy chains
Except for star-based networks, the easiest way to add more computers into a
network is by daisy-chaining, or connecting each computer in series to the next. If a
message is intended for a computer partway down the line, each system bounces it
along in sequence until it reaches the destination. A daisy-chained network can take
two basic forms: linear and ring.
A linear topology puts a two-way link between one computer and the next.
However, this was expensive in the early days of computing, since each
computer (except for the ones at each end) required two receivers and two
transmitters.
By connecting the computers at each end, a ring topology can be formed. An
advantage of the ring is that the number of transmitters and receivers can be
cut in half, since a message will eventually loop all of the way around. When a
node sends a message, the message is processed by each computer in the
ring. If a computer is not the destination node, it will pass the message to the
next node, until the message arrives at its destination. If the message is not
accepted by any node on the network, it will travel around the entire ring and
return to the sender. This potentially results in a doubling of travel time for
data, but since it is traveling at a fairly insignificant multiple of the speed of
light, the loss is usually negligible.
Centralization
The star topology reduces the probability of a network failure by connecting all of the
peripheral nodes (computers, etc.) to a central node. When the physical star
topology is applied to a logical bus network such as Ethernet, this central node
160
(traditionally a hub) rebroadcasts all transmissions received from any peripheral
node to all peripheral nodes on the network, sometimes including the originating
node. All peripheral nodes may thus communicate with all others by transmitting to,
and receiving from, the central node only. The failure of a transmission line linking
any peripheral node to the central node will result in the isolation of that peripheral
node from all others, but the remaining peripheral nodes will be unaffected.
However, the disadvantage is that the failure of the central node will cause the
failure of all of the peripheral nodes also.
If the central node is passive, the originating node must be able to tolerate the
reception of an echo of its own transmission, delayed by the two-way round trip
transmission time (i.e. to and from the central node) plus any delay generated in the
central node. An active star network has an active central node that usually has the
means to prevent echo-related problems.
A tree topology (a.k.a. hierarchical topology) can be viewed as a collection of star
networks arranged in a hierarchy. This tree has individual peripheral nodes (e.g.
leaves) which are required to transmit to and receive from one other node only and
are not required to act as repeaters or regenerators. Unlike the star network, the
functionality of the central node may be distributed.
As in the conventional star network, individual nodes may thus still be isolated from
the network by a single-point failure of a transmission path to the node. If a link
connecting a leaf fails, that leaf is isolated; if a connection to a non-leaf node fails,
an entire section of the network becomes isolated from the rest.
In order to alleviate the amount of network traffic that comes from broadcasting all
signals to all nodes, more advanced central nodes were developed that are able to
keep track of the identities of the nodes that are connected to the network. These
network switches will "learn" the layout of the network by "listening" on each port
during normal data transmission, examining the data packets and recording the
address/identifier of each connected node and which port it's connected to in a
lookup table held in memory. This lookup table then allows future transmissions to
be forwarded to the intended destination only.
Decentralization
161
In a mesh topology (i.e., a partially connected mesh topology), there are at least
two nodes with two or more paths between them to provide redundant paths to be
used in case the link providing one of the paths fails. This decentralization is often
used to advantage to compensate for the single-point-failure disadvantage that is
present when using a single device as a central node (e.g., in star and tree
networks). A special kind of mesh, limiting the number of hops between two nodes,
is a hypercube. The number of arbitrary forks in mesh networks makes them more
difficult to design and implement, but their decentralized nature makes them very
useful. This is similar in some ways to a grid network, where a linear or ring topology
is used to connect systems in multiple directions. A multi-dimensional ring has a
toroidal topology, for instance.
A fully connected network, complete topology or full mesh topology is a network
topology in which there is a direct link between all pairs of nodes. In a fully
connected network with n nodes, there are n(n-1)/2 direct links. Networks designed
with this topology are usually very expensive to set up, but provide a high degree of
reliability due to the multiple paths for data that are provided by the large number of
redundant links between nodes. This topology is mostly seen in military applications.
However, it can also be seen in the file sharing protocol BitTorrent in which users
connect to other users in the "swarm" by allowing each user sharing the file to
connect to other users also involved. Often in actual usage of BitTorrent any given
individual node is rarely connected to every single other node as in a true fully
connected network but the protocol does allow for the possibility for any one node to
connect to any other node when sharing files.
Hybrids
Hybrid networks use a combination of any two or more topologies in such a way that
the resulting network does not exhibit one of the standard topologies (e.g., bus, star,
ring, etc.). For example, a tree network connected to a tree network is still a tree
network, but two star networks connected together exhibit a hybrid network
topology. A hybrid topology is always produced when two different basic network
162
topologies are connected. Two common examples for Hybrid network are: star ring
network and star bus network
A Star ring network consists of two or more star topologies connected using a
multistation access unit (MAU) as a centralized hub.
A Star Bus network consists of two or more star topologies connected using a
bus trunk (the bus trunk serves as the network's backbone).
While grid networks have found popularity in high-performance computing
applications, some systems have used genetic algorithms to design custom networks
that have the fewest possible hops in between different nodes. Some of the resulting
layouts are nearly incomprehensible, although they function quite well.