Evolution Computer1

8/14/2019 Evolution Computer1

1/17

Evolution Of Computers:

o A Brief History Of Computers

Prepared by Mubeen Ahmed

A Brief History Of Computers

!"History reveals a clear pattern in the evolution of computers. Processing powerincreases rapidly after the introduction of the new technology. The rate of growth

eventually slows down as the technology is exploited to its full potential.While in the background other technologies are nurturing and one ultimately

supersedes the other to become the dominant technology and this cycle is

repeated.

!"Under the right conditions the shift to the new technology can lead to possible

increase in processor speed of hundred to thousand times

Electromechanical Computer

All- electronic computer with vacuum tubes

Fully transistorized computerScalable massive parallelism


2/17

Machine for

computationalassistance

Abacus from

ChinaAssisted

1642

Blasie Pascal

made the firstmachine that

could Add

1672

Leibniz made

a machinethat could

perform all

four basic

functions

1822

CharlesBabbage +,-,/,* and solve polynomial

equations

Idea of a programmable machine

Never succeeded but made an Analytical

Machine

Input# control#processor# store#output

Inspired inventors made little

improvement

Inspire a brilliant countess

Lady Ada Lovelace She thought about analytical

design and realized that DO

IF would necessary

British Mathematican George Boolestarted to study about the foundation

of logic

An argument be presented by x or yBut the result could only be True or

False

Studied in detail and found out thatAND OR NOT could be used

together to analyze any proposition

logically

1807 American LogicanCharles Sadis Pierce

observerd that upcoming

electrical ON/OFF

technology could be

intertwined with Boole work

1937George Stibitz from Bell

Laboratory practically

Made an adder then amultiplier etc.

Using Boole and Pierce

work


3/17

Howard Aiken used wheelscontrolled by electrical

impulses

Beginning of electricalCom utational machines

MARK I was made in

World War II

Mauchly and Ecleut was

given a project to make thefirst complete electrical

machine using vaccum tubes

Electrical Engineer At Upeen

ENIAC was made after the

war ended It was a massive

machine

Von Neuman meet Herman

Goldstine accidentially.

He collaborated extensively withthe ENIAC team.

His efforts were to use compter

to solve real world problems

This collaboration lead to theMost influential paper which

formed basis for the VON

NEUMAN ARCHITECTURE

FOUR STEP SYSTEM

Extract Input One# Extract Input Two#

Extract The Instruction# Store the output

SCALAR PROCESSING

FLOPS is floating point operations per

second which is a term used tocompare the processing power of

machines


4/17

Transistors were introduced in 1950s by John Bardeen and William Shockley

from Bell Labs

More transistors could be placed on one chipAnd they were very much faster.

The US Govt intervened to accelerate ythe development Remington Rand and

IBM given the challenge to make first all transistor machineRemigton Rand won the contract

LARC made with 60,000 transistors

IBM worked in the background made 169100 transistor machineBut were unable to reach the required speed

After losing millions of dollars both industries decided to proceed to a more

lucrative business market

A vacuum built on the high computation side this was later on taken up byControl Data Corporation lead by Seymour Cray

Which would lead the market for next two decades

Integrated circuits and then processors on asingle chip were introducedPower consumption decreased

These integrated circuits marked the

beginning of increase of speed more bydesign

Seymour implemented what was known asvectorization in processor design

Task multiply

100 numbers

9 * 100 instructions

Output result

100 + 9 instructions


5/17

Computer Architectures

Taxonomy of Architectures

For computer architectures, Flynn proposed that the two dimensions be termed Instruction and

Data, and that, for both of them, the two values they could take be Single or Multiple.

Single Instruction, Single Data (SISD)

This is the oldest style of computer architecture,and still one of the most important: all personalcomputers fit within this category. Singleinstructionrefers to the fact that there is onlyone instruction stream being acted on by theCPU during any one clock tick; single datameans, analogously, that one and only one datastream is being employed as input during anyone clock tick. These factors lead to two veryimportant characteristics of SISD stylecomputers:

Serial Instructions are executed one afterthe other, in lock-step;

Deterministic Examples: Most non-supercomputers


6/17

Multiple Instruction, Single Data (MISD)

Single Instruction, Multiple Data (SIMD)

Few actual examples of computers in thisclass exist;

However, special-purpose machines arecertainly conceivable that would fit into thisniche: multiple frequency filters operatingon a single signal stream, or multiplecryptography algorithms attempting tocrack a single coded message. Both ofthese are examples of this type of

processing where multiple, independentinstruction streams are appliedsimultaneously to a single data stream.

A very important class of architectures in thehistory of computation, single-instruction/multiple-datamachines arecapable of applying the exact sameinstruction stream to multiple streams ofdata simultaneously. For certain classes ofproblems, e.g., those known as data-parallelproblems, this type of architecture isperfectly suited to achieving very high

processing rates, as the data can be splitinto many different independent pieces, andthe multiple instruction units can all operateon them at the same time.

Synchronous (lock-step) Deterministic


7/17

Multiple Instruction, Multiple Data (MIMD)

Many believe that the next majoradvances in computational capabilities willbe enabled by this approach to parallelismwhich provides for multiple instructionstreams simultaneously applied to multipledata streams. The most general of all ofthe major categories, a MIMD machine iscapable of being programmed to operateas if it were in fact any of the four.

Synchronous or asynchronous

MIMD instruction streams canpotentially be executed eithersynchronously or asynchronously,i.e., either in tightly controlled lock-step or in a more loosely bound "do

your own thing" mode. Some kinds of algorithms require one or the other, anddifferent kinds of MIMD systems are better suited to one or the other; optimumefficiency depends on making sure that the system you run your code on reflects the

style of synchronicity required by your code.

Non-deterministic

Multiple Instruction or Single Program

MIMD-style systems are capable of running in true "multiple-instruction" mode,with every processor doing something different, or every processor can be giventhe same code; this latter case is called SPMD, "Single Program Multiple Data",and is a generalization of SIMD-style parallelism, with much less strictsynchronization requirements.


8/17

Terminology of Parallelism

Types of Parallelism: There are two basic ways to partition computational work amongparallel tasks:

Data parallelism: each task performs the same series of calculations, but applies them todifferent data. For example, four processors can search census data looking for people

above a certain income; each processor does the exact same operations, but works on

different parts of the database.

Functional parallelism: each task performs different calculations, i.e., carries out

different functions of the overall problem. This can be on the same data or different data.

For example, 5 processors can model an ecosystem, with each processor simulating a

different level of the food chain (plants, herbivores, carnivores, scavengers, and

decomposers).

Task: A logically discrete section of computational work.

Parallel Tasks : Tasks whose computations are independent of each other, so that

all such tasks can be performed simultaneously with correct results.

Parallelizable Problem : A problem that can be divided into parallel tasks. Thismay require changes in the code and/or the underlying algorithm.

Example of Parallelizable Problem:Calculate the potential energy for each of several thousand independent

conformations of a molecule; when done, find the minimum energy conformation

Example of a Non-parallelizable Problem:Calculation of the Fibonacci series (1,1,2,3,5,8,13,21,...) by use of the formula:

F(k + 2) = F(k + 1) + F(k)

A non-parallelizable problem, such as the calculation of the Fibonacci sequence

above, would entail dependent calculations rather than independent ones

Observed speedup of a code which has been parallelized =

wall-clock time of serial execution---------------------------------------

wall-clock time of parallel execution


9/17

SynchronizationThe temporal coordination of parallel tasks. It involves waiting until two or more

tasks reach a specified point (a sync point) before continuing any of the tasks.

Synchronization is needed to coordinate information exchange among tasks; e.g.,the previous example finding minimum energy conformation: all of the

conformations had to be completed before the minimum could be found, so any task

that was dependent upon finding that minimum would have had to wait until it wasfound before continuing.

Synchronization can consume wall-clock time because processor(s) sit idle waitingfor tasks on other processors to complete.

Synchronization can be a major factor in decreasing parallel speedup, because, asthe previous point illustrates, the time spent waiting could have been spent in useful

calculation, were synchronization not necessary.

Parallel Overhead

Time to start a taskThis involves, among other things:

identifying the tasklocating a processor to run itloading the task onto the processor

putting whatever data the task needs onto the processor

actually starting the task

Time to terminate a taskTermination isn't a simple chore, either: at the very least, results have to be

combined or transferred, and operating system resources have to be freed before the

processor can be used for other tasks.

Synchronization time, as previously explained.


10/17


11/17

Parallel Program Design :

!"First we cover the ideal goals for a parallel solution. We review functional and

data parallelism, and SPMD and Master Worker.!"Then we walk through 5 problem examples showing diagrams of possible parallel

solutions.

!"Problems faced in prallel programming

Goals (ideal)

Ideal (read: unrealistic) goals for writing a program with maximum speedup and

scalability: Each process has a unique bit of work to do, and does not have to redo any other work in

order to get its bit done.

Each process stores the data needed to accomplish that work, and does not require anyone

else's data.

A given piece of data exists only on one process, and each bit of computation only needs

to be done once, by one process.

Communication between processes is minimized.

Load is balanced; each process should be finished at the same time.

Usually it is much more complicated than this!Keep in mind that:

There may be several parallel solutions to your problem.

The best parallel solution may not flow directly from the best serial solution.


12/17

Major Decisions

.

Functional Parallelism?

Partition by task (functional parallelism)

Each process performs a different "function" or executes a

different code section

First identify functions, then look at the data requirements

Data Parallelism?

Each process does the same work on a unique piece of data

"Owner computes" First divide the data. Each process then becomes responsible for

whatever work is needed to process that data.

Data placement is an essential part of a data-parallel algorithm

Data parallelism is probably more scalable than functional parallelism


13/17

Distributed memory programming models

Distributed memory architectures are fertile grounds for the use of many different styles

of parallel programming, from those emphasizing homogeneity of process butheterogeneity of data, to full heterogeneity of both.

Data parallel

Many significant problems, over the entire computational

complexity scale, fall into the data parallel model, which basically

stands for "do the same thing to all this data":Explicit data distribution (via directives)

The data is assumed to have some form ofregularity, some

geometric shape or other such characteristic by which it may be

subdivided among the available processors, usually by use of

directives commonly hidden from the executable code within

program comment statements.

Single thread of control

Each processor in the distributed environment is loaded with a copy

of the same code, hence single thread of control; it is not necessary,nor expected, that all processors will be synchronized in their

execution of this code, although the amount of instruction-

separation is generally kept as small as possible in order to, among

other things, maintain high levels of processor efficiency (i.e., if

some processors have much more work to do than others, even

though they're all running off the same code, then it'll turn out that

some processors get finished long before the others do, and will

simply be sitting there spinning, soaking up cycles and research

bucks, until the other processors complete their tasks ... this is

known as load-imbalance, and we'll talk more about this later, but it

should be obvious even now that it is a bad thing).

Examples:HPF

High Performance Fortran (HPF) is a standard in this sort of

work


14/17

Key principles in explicit message passing programming

Addressability

As one module in a distributed application, knowing what you know, and, for what you

don't who to ask, is one of the central issues in message passing applications. "What you

know" is the data you have resident on your own processor; what youdon't know" isanything that resides elsewhere, but you've discovered is necessary for you to find out.

CPU can issue load/store operations involving local memory space only

Requests for any data stored in remote processor's memory must be converted by

programmer or run-time library into message passing calls which copy data between localmemories.

You not only have to know that you don't know something, or that something that you

We're now going to discuss some general issues relevant to the construction of well-

designed distributed applications which rely on explicit message passing for data- andcontrol-communications. These principles are largely concerned with issues you should

be focusing on as you consider the parallelization of your application:

How is memory going to be used, and from where?

How will the different parts of the application be coordinated?

What kinds of operations can be done collectively?

When should communications be blocking, and when non-blocking?

What kinds ofsynchronization considerations need to be addressed, and when? What kinds of common problems could be encountered, and how can they be

avoided?

As has been mentioned before, and as will be mentioned again:

There's no substitute for a good design ... and the worse your design, the more timeyou'll spending debugging it.

It must be emphasized that the machine does not think for itself. It may exercise some degree ofjudgment and discrimination, but the situations in which these are required, the criteria to be

applied, and the actions to be taken according to the criteria, have all to be foreseen in the

program of operating instructions furnished to the machine. Use of the machine is no substitutefor thought on the basic organization of a computation, only for the labour of carrying out the

details of the application of that thought."

Douglass R. Hartree, Moore School lecture, Univ. of Penn., 9 July 1946


15/17

used to know is now out-of-date and needs refreshing ... you also need to know where to

go to get the latest version of the information you're interested in.

No shared variables or atomic global updates (e.g. counters, loop indices)

Synchronization is going to cost you, because there's no easy way to quickly get this kindof information to everybody ... that's just one of defining characteristics of this model of

operation, and if its implications are too detrimental to the effectiveness of your

application, that's a good enough reason to explore other alternatives.

Communication and Synchronization

The act of communicating within a distributed computing environment is very much a

team-effort, and has implications beyond that of simply getting information fromprocessor-a to processor-b.

On multicomputers, all interprocessor communication, including synchronization, is

implemented by passing messages (copying data) between processorsMaking sure that everyone is using the right value of variablex is, without question, a

very important aspect of distributed computing; but so is making sure that no one tries touse that value before the rest of the pieces are in place, a matter ofsynchronization.Given that the only point of connection among all of the processing elements in a

distributed environment lies in the messages that are exchanged, synchronization, then,

must also be a matter of message-passing.

In fact, synchronization is very often seen as a separable subset of all communicationtraffic, more a matter ofcontrol information than data

keep your synchronization requirements to the absolute minimum, and code them to be

lean-and-mean so that as little time is taken up in synchronization (and consequently

away from meaningful computation) as possible.

All messages must be explicitly received (sends and receives must be paired)

Just like the junk mail that piles up in you mailbox and obscures the really importantstuff

(like your tax return, or the latest edition of TV-Guide), messages that are sent but neverexplicitly received are a drain on network resources.


16/17

Grain Size

Grain size loosely refers to the amount of

computation that is done betweencommunication or synchronization

( T + S ) * equally shared load

So S is important

Starvation

The amount of time a processor is

interrupted to report its present state

Should not be large or the processor

will not have time to compute

Deadlock

A set of processes is deadlocked if each

processes in the set hold and none will

release until the processes have granted theother resources that they are waiting

You can try to detect a deadlock a kill aprocess but this requires a monitoring

system

You can make deadlock impossible if you

number your resources and requestingresources in ascending order.

Flooding and Throttling

For many parallel problem the problem is

broken down into further parallel task

This should not so much that you are

unable to the number of tasks exceeds the

number of processors if this happens theforward execution of the program fill be

severly impaired

Dynamic switching is a technique might be

used to jump between the two.

Load Balancing

We can distribute the load by (N/P)(floor or ceiling)

Ceiling has the advantage that one

processor does not become the bottleneck

Communication Bottle Necks

Which is the bottle neck of parallel

computation and how to remove it .

Partitioning and Scheduling

One of the most important tasks

Scheduling might be static or dynamic

Job Jar technique


17/17

Costs of Parallel Processing

By this point, I hope you will have gotten the joint message that:

Parallel processing can be extremely useful, but...There Ain't No Such Thing As A Free Lunch

Programmer's timeAs the programmer, your time is largely going to be spent doing the following:

Analyzing code for parallelism

The more significant parallelism you can find, not simply in the existing code, but evenmore importantly in the overall task that the code is intended to address, the more

speedup you can expect to obtain for your efforts.

Recoding

Having discovered the places where you think parallelism will give results, you now have

to put it in. This can be a very time-consuming process.

Complicated debugging

Debugging aparallel application is at least an order of magnitude more infuriating,because you not only have multiple instruction streams running around doing things at

the same time, you've also got information flowing amongst them all, again all at the

same time, and who knows!?! what's causing the errors you're seeing?

It really is that bad. Trust me.Do whatever you can to avoid having to debug parallel code:

consider a career change;

hire someone else to do it;

or write the best, self-debugging, modular and error-correcting code youpossibly can, the first time.

If you decide to stick with it, and follow the advice in that last point, you'll find that thetime you put into writing good, well-designed code has a tremendous impact on how

quickly you get it running correctly. Pay the price up front.

and only for as long as you actually need them.

Evolution Computer1

Documents

Transcript of Evolution Computer1