Post on 30-May-2018
8/14/2019 Evolution Computer1
1/17
Evolution Of Computers:
o A Brief History Of Computers
Prepared by Mubeen Ahmed
A Brief History Of Computers
!"History reveals a clear pattern in the evolution of computers. Processing powerincreases rapidly after the introduction of the new technology. The rate of growth
eventually slows down as the technology is exploited to its full potential.While in the background other technologies are nurturing and one ultimately
supersedes the other to become the dominant technology and this cycle is
repeated.
!"Under the right conditions the shift to the new technology can lead to possible
increase in processor speed of hundred to thousand times
Electromechanical Computer
All- electronic computer with vacuum tubes
Fully transistorized computerScalable massive parallelism
8/14/2019 Evolution Computer1
2/17
Machine for
computationalassistance
Abacus from
ChinaAssisted
1642
Blasie Pascal
made the firstmachine that
could Add
1672
Leibniz made
a machinethat could
perform all
four basic
functions
1822
CharlesBabbage +,-,/,* and solve polynomial
equations
Idea of a programmable machine
Never succeeded but made an Analytical
Machine
Input# control#processor# store#output
Inspired inventors made little
improvement
Inspire a brilliant countess
Lady Ada Lovelace She thought about analytical
design and realized that DO
IF would necessary
British Mathematican George Boolestarted to study about the foundation
of logic
An argument be presented by x or yBut the result could only be True or
False
Studied in detail and found out thatAND OR NOT could be used
together to analyze any proposition
logically
1807 American LogicanCharles Sadis Pierce
observerd that upcoming
electrical ON/OFF
technology could be
intertwined with Boole work
1937George Stibitz from Bell
Laboratory practically
Made an adder then amultiplier etc.
Using Boole and Pierce
work
8/14/2019 Evolution Computer1
3/17
Howard Aiken used wheelscontrolled by electrical
impulses
Beginning of electricalCom utational machines
MARK I was made in
World War II
Mauchly and Ecleut was
given a project to make thefirst complete electrical
machine using vaccum tubes
Electrical Engineer At Upeen
ENIAC was made after the
war ended It was a massive
machine
Von Neuman meet Herman
Goldstine accidentially.
He collaborated extensively withthe ENIAC team.
His efforts were to use compter
to solve real world problems
This collaboration lead to theMost influential paper which
formed basis for the VON
NEUMAN ARCHITECTURE
FOUR STEP SYSTEM
Extract Input One# Extract Input Two#
Extract The Instruction# Store the output
SCALAR PROCESSING
FLOPS is floating point operations per
second which is a term used tocompare the processing power of
machines
8/14/2019 Evolution Computer1
4/17
Transistors were introduced in 1950s by John Bardeen and William Shockley
from Bell Labs
More transistors could be placed on one chipAnd they were very much faster.
The US Govt intervened to accelerate ythe development Remington Rand and
IBM given the challenge to make first all transistor machineRemigton Rand won the contract
LARC made with 60,000 transistors
IBM worked in the background made 169100 transistor machineBut were unable to reach the required speed
After losing millions of dollars both industries decided to proceed to a more
lucrative business market
A vacuum built on the high computation side this was later on taken up byControl Data Corporation lead by Seymour Cray
Which would lead the market for next two decades
Integrated circuits and then processors on asingle chip were introducedPower consumption decreased
These integrated circuits marked the
beginning of increase of speed more bydesign
Seymour implemented what was known asvectorization in processor design
Task multiply
100 numbers
9 * 100 instructions
Output result
100 + 9 instructions
8/14/2019 Evolution Computer1
5/17
Computer Architectures
Taxonomy of Architectures
For computer architectures, Flynn proposed that the two dimensions be termed Instruction and
Data, and that, for both of them, the two values they could take be Single or Multiple.
Single Instruction, Single Data (SISD)
This is the oldest style of computer architecture,and still one of the most important: all personalcomputers fit within this category. Singleinstructionrefers to the fact that there is onlyone instruction stream being acted on by theCPU during any one clock tick; single datameans, analogously, that one and only one datastream is being employed as input during anyone clock tick. These factors lead to two veryimportant characteristics of SISD stylecomputers:
Serial Instructions are executed one afterthe other, in lock-step;
Deterministic Examples: Most non-supercomputers
8/14/2019 Evolution Computer1
6/17
Multiple Instruction, Single Data (MISD)
Single Instruction, Multiple Data (SIMD)
Few actual examples of computers in thisclass exist;
However, special-purpose machines arecertainly conceivable that would fit into thisniche: multiple frequency filters operatingon a single signal stream, or multiplecryptography algorithms attempting tocrack a single coded message. Both ofthese are examples of this type of
processing where multiple, independentinstruction streams are appliedsimultaneously to a single data stream.
A very important class of architectures in thehistory of computation, single-instruction/multiple-datamachines arecapable of applying the exact sameinstruction stream to multiple streams ofdata simultaneously. For certain classes ofproblems, e.g., those known as data-parallelproblems, this type of architecture isperfectly suited to achieving very high
processing rates, as the data can be splitinto many different independent pieces, andthe multiple instruction units can all operateon them at the same time.
Synchronous (lock-step) Deterministic
8/14/2019 Evolution Computer1
7/17
Multiple Instruction, Multiple Data (MIMD)
Many believe that the next majoradvances in computational capabilities willbe enabled by this approach to parallelismwhich provides for multiple instructionstreams simultaneously applied to multipledata streams. The most general of all ofthe major categories, a MIMD machine iscapable of being programmed to operateas if it were in fact any of the four.
Synchronous or asynchronous
MIMD instruction streams canpotentially be executed eithersynchronously or asynchronously,i.e., either in tightly controlled lock-step or in a more loosely bound "do
your own thing" mode. Some kinds of algorithms require one or the other, anddifferent kinds of MIMD systems are better suited to one or the other; optimumefficiency depends on making sure that the system you run your code on reflects the
style of synchronicity required by your code.
Non-deterministic
Multiple Instruction or Single Program
MIMD-style systems are capable of running in true "multiple-instruction" mode,with every processor doing something different, or every processor can be giventhe same code; this latter case is called SPMD, "Single Program Multiple Data",and is a generalization of SIMD-style parallelism, with much less strictsynchronization requirements.
8/14/2019 Evolution Computer1
8/17
Terminology of Parallelism
Types of Parallelism: There are two basic ways to partition computational work amongparallel tasks:
Data parallelism: each task performs the same series of calculations, but applies them todifferent data. For example, four processors can search census data looking for people
above a certain income; each processor does the exact same operations, but works on
different parts of the database.
Functional parallelism: each task performs different calculations, i.e., carries out
different functions of the overall problem. This can be on the same data or different data.
For example, 5 processors can model an ecosystem, with each processor simulating a
different level of the food chain (plants, herbivores, carnivores, scavengers, and
decomposers).
Task: A logically discrete section of computational work.
Parallel Tasks : Tasks whose computations are independent of each other, so that
all such tasks can be performed simultaneously with correct results.
Parallelizable Problem : A problem that can be divided into parallel tasks. Thismay require changes in the code and/or the underlying algorithm.
Example of Parallelizable Problem:Calculate the potential energy for each of several thousand independent
conformations of a molecule; when done, find the minimum energy conformation
Example of a Non-parallelizable Problem:Calculation of the Fibonacci series (1,1,2,3,5,8,13,21,...) by use of the formula:
F(k + 2) = F(k + 1) + F(k)
A non-parallelizable problem, such as the calculation of the Fibonacci sequence
above, would entail dependent calculations rather than independent ones
Observed speedup of a code which has been parallelized =
wall-clock time of serial execution---------------------------------------
wall-clock time of parallel execution
8/14/2019 Evolution Computer1
9/17
SynchronizationThe temporal coordination of parallel tasks. It involves waiting until two or more
tasks reach a specified point (a sync point) before continuing any of the tasks.
Synchronization is needed to coordinate information exchange among tasks; e.g.,the previous example finding minimum energy conformation: all of the
conformations had to be completed before the minimum could be found, so any task
that was dependent upon finding that minimum would have had to wait until it wasfound before continuing.
Synchronization can consume wall-clock time because processor(s) sit idle waitingfor tasks on other processors to complete.
Synchronization can be a major factor in decreasing parallel speedup, because, asthe previous point illustrates, the time spent waiting could have been spent in useful
calculation, were synchronization not necessary.
Parallel Overhead
Time to start a taskThis involves, among other things:
identifying the tasklocating a processor to run itloading the task onto the processor
putting whatever data the task needs onto the processor
actually starting the task
Time to terminate a taskTermination isn't a simple chore, either: at the very least, results have to be
combined or transferred, and operating system resources have to be freed before the
processor can be used for other tasks.
Synchronization time, as previously explained.
8/14/2019 Evolution Computer1
10/17
8/14/2019 Evolution Computer1
11/17
Parallel Program Design :
!"First we cover the ideal goals for a parallel solution. We review functional and
data parallelism, and SPMD and Master Worker.!"Then we walk through 5 problem examples showing diagrams of possible parallel
solutions.
!"Problems faced in prallel programming
Goals (ideal)
Ideal (read: unrealistic) goals for writing a program with maximum speedup and
scalability: Each process has a unique bit of work to do, and does not have to redo any other work in
order to get its bit done.
Each process stores the data needed to accomplish that work, and does not require anyone
else's data.
A given piece of data exists only on one process, and each bit of computation only needs
to be done once, by one process.
Communication between processes is minimized.
Load is balanced; each process should be finished at the same time.
Usually it is much more complicated than this!Keep in mind that:
There may be several parallel solutions to your problem.
The best parallel solution may not flow directly from the best serial solution.
8/14/2019 Evolution Computer1
12/17
Major Decisions
.
Functional Parallelism?
Partition by task (functional parallelism)
Each process performs a different "function" or executes a
different code section
First identify functions, then look at the data requirements
Data Parallelism?
Each process does the same work on a unique piece of data
"Owner computes" First divide the data. Each process then becomes responsible for
whatever work is needed to process that data.
Data placement is an essential part of a data-parallel algorithm
Data parallelism is probably more scalable than functional parallelism
8/14/2019 Evolution Computer1
13/17
Distributed memory programming models
Distributed memory architectures are fertile grounds for the use of many different styles
of parallel programming, from those emphasizing homogeneity of process butheterogeneity of data, to full heterogeneity of both.
Data parallel
Many significant problems, over the entire computational
complexity scale, fall into the data parallel model, which basically
stands for "do the same thing to all this data":Explicit data distribution (via directives)
The data is assumed to have some form ofregularity, some
geometric shape or other such characteristic by which it may be
subdivided among the available processors, usually by use of
directives commonly hidden from the executable code within
program comment statements.
Single thread of control
Each processor in the distributed environment is loaded with a copy
of the same code, hence single thread of control; it is not necessary,nor expected, that all processors will be synchronized in their
execution of this code, although the amount of instruction-
separation is generally kept as small as possible in order to, among
other things, maintain high levels of processor efficiency (i.e., if
some processors have much more work to do than others, even
though they're all running off the same code, then it'll turn out that
some processors get finished long before the others do, and will
simply be sitting there spinning, soaking up cycles and research
bucks, until the other processors complete their tasks ... this is
known as load-imbalance, and we'll talk more about this later, but it
should be obvious even now that it is a bad thing).
Examples:HPF
High Performance Fortran (HPF) is a standard in this sort of
work
8/14/2019 Evolution Computer1
14/17
Key principles in explicit message passing programming
Addressability
As one module in a distributed application, knowing what you know, and, for what you
don't who to ask, is one of the central issues in message passing applications. "What you
know" is the data you have resident on your own processor; what youdon't know" isanything that resides elsewhere, but you've discovered is necessary for you to find out.
CPU can issue load/store operations involving local memory space only
Requests for any data stored in remote processor's memory must be converted by
programmer or run-time library into message passing calls which copy data between localmemories.
You not only have to know that you don't know something, or that something that you
We're now going to discuss some general issues relevant to the construction of well-
designed distributed applications which rely on explicit message passing for data- andcontrol-communications. These principles are largely concerned with issues you should
be focusing on as you consider the parallelization of your application:
How is memory going to be used, and from where?
How will the different parts of the application be coordinated?
What kinds of operations can be done collectively?
When should communications be blocking, and when non-blocking?
What kinds ofsynchronization considerations need to be addressed, and when? What kinds of common problems could be encountered, and how can they be
avoided?
As has been mentioned before, and as will be mentioned again:
There's no substitute for a good design ... and the worse your design, the more timeyou'll spending debugging it.
It must be emphasized that the machine does not think for itself. It may exercise some degree ofjudgment and discrimination, but the situations in which these are required, the criteria to be
applied, and the actions to be taken according to the criteria, have all to be foreseen in the
program of operating instructions furnished to the machine. Use of the machine is no substitutefor thought on the basic organization of a computation, only for the labour of carrying out the
details of the application of that thought."
Douglass R. Hartree, Moore School lecture, Univ. of Penn., 9 July 1946
8/14/2019 Evolution Computer1
15/17
used to know is now out-of-date and needs refreshing ... you also need to know where to
go to get the latest version of the information you're interested in.
No shared variables or atomic global updates (e.g. counters, loop indices)
Synchronization is going to cost you, because there's no easy way to quickly get this kindof information to everybody ... that's just one of defining characteristics of this model of
operation, and if its implications are too detrimental to the effectiveness of your
application, that's a good enough reason to explore other alternatives.
Communication and Synchronization
The act of communicating within a distributed computing environment is very much a
team-effort, and has implications beyond that of simply getting information fromprocessor-a to processor-b.
On multicomputers, all interprocessor communication, including synchronization, is
implemented by passing messages (copying data) between processorsMaking sure that everyone is using the right value of variablex is, without question, a
very important aspect of distributed computing; but so is making sure that no one tries touse that value before the rest of the pieces are in place, a matter ofsynchronization.Given that the only point of connection among all of the processing elements in a
distributed environment lies in the messages that are exchanged, synchronization, then,
must also be a matter of message-passing.
In fact, synchronization is very often seen as a separable subset of all communicationtraffic, more a matter ofcontrol information than data
keep your synchronization requirements to the absolute minimum, and code them to be
lean-and-mean so that as little time is taken up in synchronization (and consequently
away from meaningful computation) as possible.
All messages must be explicitly received (sends and receives must be paired)
Just like the junk mail that piles up in you mailbox and obscures the really importantstuff
(like your tax return, or the latest edition of TV-Guide), messages that are sent but neverexplicitly received are a drain on network resources.
8/14/2019 Evolution Computer1
16/17
Grain Size
Grain size loosely refers to the amount of
computation that is done betweencommunication or synchronization
( T + S ) * equally shared load
So S is important
Starvation
The amount of time a processor is
interrupted to report its present state
Should not be large or the processor
will not have time to compute
Deadlock
A set of processes is deadlocked if each
processes in the set hold and none will
release until the processes have granted theother resources that they are waiting
You can try to detect a deadlock a kill aprocess but this requires a monitoring
system
You can make deadlock impossible if you
number your resources and requestingresources in ascending order.
Flooding and Throttling
For many parallel problem the problem is
broken down into further parallel task
This should not so much that you are
unable to the number of tasks exceeds the
number of processors if this happens theforward execution of the program fill be
severly impaired
Dynamic switching is a technique might be
used to jump between the two.
Load Balancing
We can distribute the load by (N/P)(floor or ceiling)
Ceiling has the advantage that one
processor does not become the bottleneck
Communication Bottle Necks
Which is the bottle neck of parallel
computation and how to remove it .
Partitioning and Scheduling
One of the most important tasks
Scheduling might be static or dynamic
Job Jar technique
8/14/2019 Evolution Computer1
17/17
Costs of Parallel Processing
By this point, I hope you will have gotten the joint message that:
Parallel processing can be extremely useful, but...There Ain't No Such Thing As A Free Lunch
Programmer's timeAs the programmer, your time is largely going to be spent doing the following:
Analyzing code for parallelism
The more significant parallelism you can find, not simply in the existing code, but evenmore importantly in the overall task that the code is intended to address, the more
speedup you can expect to obtain for your efforts.
Recoding
Having discovered the places where you think parallelism will give results, you now have
to put it in. This can be a very time-consuming process.
Complicated debugging
Debugging aparallel application is at least an order of magnitude more infuriating,because you not only have multiple instruction streams running around doing things at
the same time, you've also got information flowing amongst them all, again all at the
same time, and who knows!?! what's causing the errors you're seeing?
It really is that bad. Trust me.Do whatever you can to avoid having to debug parallel code:
consider a career change;
hire someone else to do it;
or write the best, self-debugging, modular and error-correcting code youpossibly can, the first time.
If you decide to stick with it, and follow the advice in that last point, you'll find that thetime you put into writing good, well-designed code has a tremendous impact on how
quickly you get it running correctly. Pay the price up front.
and only for as long as you actually need them.