Evolution Computer1

download Evolution Computer1

of 17

Transcript of Evolution Computer1

  • 8/14/2019 Evolution Computer1

    1/17

    Evolution Of Computers:

    o A Brief History Of Computers

    Prepared by Mubeen Ahmed

    A Brief History Of Computers

    !"History reveals a clear pattern in the evolution of computers. Processing powerincreases rapidly after the introduction of the new technology. The rate of growth

    eventually slows down as the technology is exploited to its full potential.While in the background other technologies are nurturing and one ultimately

    supersedes the other to become the dominant technology and this cycle is

    repeated.

    !"Under the right conditions the shift to the new technology can lead to possible

    increase in processor speed of hundred to thousand times

    Electromechanical Computer

    All- electronic computer with vacuum tubes

    Fully transistorized computerScalable massive parallelism

  • 8/14/2019 Evolution Computer1

    2/17

    Machine for

    computationalassistance

    Abacus from

    ChinaAssisted

    1642

    Blasie Pascal

    made the firstmachine that

    could Add

    1672

    Leibniz made

    a machinethat could

    perform all

    four basic

    functions

    1822

    CharlesBabbage +,-,/,* and solve polynomial

    equations

    Idea of a programmable machine

    Never succeeded but made an Analytical

    Machine

    Input# control#processor# store#output

    Inspired inventors made little

    improvement

    Inspire a brilliant countess

    Lady Ada Lovelace She thought about analytical

    design and realized that DO

    IF would necessary

    British Mathematican George Boolestarted to study about the foundation

    of logic

    An argument be presented by x or yBut the result could only be True or

    False

    Studied in detail and found out thatAND OR NOT could be used

    together to analyze any proposition

    logically

    1807 American LogicanCharles Sadis Pierce

    observerd that upcoming

    electrical ON/OFF

    technology could be

    intertwined with Boole work

    1937George Stibitz from Bell

    Laboratory practically

    Made an adder then amultiplier etc.

    Using Boole and Pierce

    work

  • 8/14/2019 Evolution Computer1

    3/17

    Howard Aiken used wheelscontrolled by electrical

    impulses

    Beginning of electricalCom utational machines

    MARK I was made in

    World War II

    Mauchly and Ecleut was

    given a project to make thefirst complete electrical

    machine using vaccum tubes

    Electrical Engineer At Upeen

    ENIAC was made after the

    war ended It was a massive

    machine

    Von Neuman meet Herman

    Goldstine accidentially.

    He collaborated extensively withthe ENIAC team.

    His efforts were to use compter

    to solve real world problems

    This collaboration lead to theMost influential paper which

    formed basis for the VON

    NEUMAN ARCHITECTURE

    FOUR STEP SYSTEM

    Extract Input One# Extract Input Two#

    Extract The Instruction# Store the output

    SCALAR PROCESSING

    FLOPS is floating point operations per

    second which is a term used tocompare the processing power of

    machines

  • 8/14/2019 Evolution Computer1

    4/17

    Transistors were introduced in 1950s by John Bardeen and William Shockley

    from Bell Labs

    More transistors could be placed on one chipAnd they were very much faster.

    The US Govt intervened to accelerate ythe development Remington Rand and

    IBM given the challenge to make first all transistor machineRemigton Rand won the contract

    LARC made with 60,000 transistors

    IBM worked in the background made 169100 transistor machineBut were unable to reach the required speed

    After losing millions of dollars both industries decided to proceed to a more

    lucrative business market

    A vacuum built on the high computation side this was later on taken up byControl Data Corporation lead by Seymour Cray

    Which would lead the market for next two decades

    Integrated circuits and then processors on asingle chip were introducedPower consumption decreased

    These integrated circuits marked the

    beginning of increase of speed more bydesign

    Seymour implemented what was known asvectorization in processor design

    Task multiply

    100 numbers

    9 * 100 instructions

    Output result

    100 + 9 instructions

  • 8/14/2019 Evolution Computer1

    5/17

    Computer Architectures

    Taxonomy of Architectures

    For computer architectures, Flynn proposed that the two dimensions be termed Instruction and

    Data, and that, for both of them, the two values they could take be Single or Multiple.

    Single Instruction, Single Data (SISD)

    This is the oldest style of computer architecture,and still one of the most important: all personalcomputers fit within this category. Singleinstructionrefers to the fact that there is onlyone instruction stream being acted on by theCPU during any one clock tick; single datameans, analogously, that one and only one datastream is being employed as input during anyone clock tick. These factors lead to two veryimportant characteristics of SISD stylecomputers:

    Serial Instructions are executed one afterthe other, in lock-step;

    Deterministic Examples: Most non-supercomputers

  • 8/14/2019 Evolution Computer1

    6/17

    Multiple Instruction, Single Data (MISD)

    Single Instruction, Multiple Data (SIMD)

    Few actual examples of computers in thisclass exist;

    However, special-purpose machines arecertainly conceivable that would fit into thisniche: multiple frequency filters operatingon a single signal stream, or multiplecryptography algorithms attempting tocrack a single coded message. Both ofthese are examples of this type of

    processing where multiple, independentinstruction streams are appliedsimultaneously to a single data stream.

    A very important class of architectures in thehistory of computation, single-instruction/multiple-datamachines arecapable of applying the exact sameinstruction stream to multiple streams ofdata simultaneously. For certain classes ofproblems, e.g., those known as data-parallelproblems, this type of architecture isperfectly suited to achieving very high

    processing rates, as the data can be splitinto many different independent pieces, andthe multiple instruction units can all operateon them at the same time.

    Synchronous (lock-step) Deterministic

  • 8/14/2019 Evolution Computer1

    7/17

    Multiple Instruction, Multiple Data (MIMD)

    Many believe that the next majoradvances in computational capabilities willbe enabled by this approach to parallelismwhich provides for multiple instructionstreams simultaneously applied to multipledata streams. The most general of all ofthe major categories, a MIMD machine iscapable of being programmed to operateas if it were in fact any of the four.

    Synchronous or asynchronous

    MIMD instruction streams canpotentially be executed eithersynchronously or asynchronously,i.e., either in tightly controlled lock-step or in a more loosely bound "do

    your own thing" mode. Some kinds of algorithms require one or the other, anddifferent kinds of MIMD systems are better suited to one or the other; optimumefficiency depends on making sure that the system you run your code on reflects the

    style of synchronicity required by your code.

    Non-deterministic

    Multiple Instruction or Single Program

    MIMD-style systems are capable of running in true "multiple-instruction" mode,with every processor doing something different, or every processor can be giventhe same code; this latter case is called SPMD, "Single Program Multiple Data",and is a generalization of SIMD-style parallelism, with much less strictsynchronization requirements.

  • 8/14/2019 Evolution Computer1

    8/17

    Terminology of Parallelism

    Types of Parallelism: There are two basic ways to partition computational work amongparallel tasks:

    Data parallelism: each task performs the same series of calculations, but applies them todifferent data. For example, four processors can search census data looking for people

    above a certain income; each processor does the exact same operations, but works on

    different parts of the database.

    Functional parallelism: each task performs different calculations, i.e., carries out

    different functions of the overall problem. This can be on the same data or different data.

    For example, 5 processors can model an ecosystem, with each processor simulating a

    different level of the food chain (plants, herbivores, carnivores, scavengers, and

    decomposers).

    Task: A logically discrete section of computational work.

    Parallel Tasks : Tasks whose computations are independent of each other, so that

    all such tasks can be performed simultaneously with correct results.

    Parallelizable Problem : A problem that can be divided into parallel tasks. Thismay require changes in the code and/or the underlying algorithm.

    Example of Parallelizable Problem:Calculate the potential energy for each of several thousand independent

    conformations of a molecule; when done, find the minimum energy conformation

    Example of a Non-parallelizable Problem:Calculation of the Fibonacci series (1,1,2,3,5,8,13,21,...) by use of the formula:

    F(k + 2) = F(k + 1) + F(k)

    A non-parallelizable problem, such as the calculation of the Fibonacci sequence

    above, would entail dependent calculations rather than independent ones

    Observed speedup of a code which has been parallelized =

    wall-clock time of serial execution---------------------------------------

    wall-clock time of parallel execution

  • 8/14/2019 Evolution Computer1

    9/17

    SynchronizationThe temporal coordination of parallel tasks. It involves waiting until two or more

    tasks reach a specified point (a sync point) before continuing any of the tasks.

    Synchronization is needed to coordinate information exchange among tasks; e.g.,the previous example finding minimum energy conformation: all of the

    conformations had to be completed before the minimum could be found, so any task

    that was dependent upon finding that minimum would have had to wait until it wasfound before continuing.

    Synchronization can consume wall-clock time because processor(s) sit idle waitingfor tasks on other processors to complete.

    Synchronization can be a major factor in decreasing parallel speedup, because, asthe previous point illustrates, the time spent waiting could have been spent in useful

    calculation, were synchronization not necessary.

    Parallel Overhead

    Time to start a taskThis involves, among other things:

    identifying the tasklocating a processor to run itloading the task onto the processor

    putting whatever data the task needs onto the processor

    actually starting the task

    Time to terminate a taskTermination isn't a simple chore, either: at the very least, results have to be

    combined or transferred, and operating system resources have to be freed before the

    processor can be used for other tasks.

    Synchronization time, as previously explained.

  • 8/14/2019 Evolution Computer1

    10/17

  • 8/14/2019 Evolution Computer1

    11/17

    Parallel Program Design :

    !"First we cover the ideal goals for a parallel solution. We review functional and

    data parallelism, and SPMD and Master Worker.!"Then we walk through 5 problem examples showing diagrams of possible parallel

    solutions.

    !"Problems faced in prallel programming

    Goals (ideal)

    Ideal (read: unrealistic) goals for writing a program with maximum speedup and

    scalability: Each process has a unique bit of work to do, and does not have to redo any other work in

    order to get its bit done.

    Each process stores the data needed to accomplish that work, and does not require anyone

    else's data.

    A given piece of data exists only on one process, and each bit of computation only needs

    to be done once, by one process.

    Communication between processes is minimized.

    Load is balanced; each process should be finished at the same time.

    Usually it is much more complicated than this!Keep in mind that:

    There may be several parallel solutions to your problem.

    The best parallel solution may not flow directly from the best serial solution.

  • 8/14/2019 Evolution Computer1

    12/17

    Major Decisions

    .

    Functional Parallelism?

    Partition by task (functional parallelism)

    Each process performs a different "function" or executes a

    different code section

    First identify functions, then look at the data requirements

    Data Parallelism?

    Each process does the same work on a unique piece of data

    "Owner computes" First divide the data. Each process then becomes responsible for

    whatever work is needed to process that data.

    Data placement is an essential part of a data-parallel algorithm

    Data parallelism is probably more scalable than functional parallelism

  • 8/14/2019 Evolution Computer1

    13/17

    Distributed memory programming models

    Distributed memory architectures are fertile grounds for the use of many different styles

    of parallel programming, from those emphasizing homogeneity of process butheterogeneity of data, to full heterogeneity of both.

    Data parallel

    Many significant problems, over the entire computational

    complexity scale, fall into the data parallel model, which basically

    stands for "do the same thing to all this data":Explicit data distribution (via directives)

    The data is assumed to have some form ofregularity, some

    geometric shape or other such characteristic by which it may be

    subdivided among the available processors, usually by use of

    directives commonly hidden from the executable code within

    program comment statements.

    Single thread of control

    Each processor in the distributed environment is loaded with a copy

    of the same code, hence single thread of control; it is not necessary,nor expected, that all processors will be synchronized in their

    execution of this code, although the amount of instruction-

    separation is generally kept as small as possible in order to, among

    other things, maintain high levels of processor efficiency (i.e., if

    some processors have much more work to do than others, even

    though they're all running off the same code, then it'll turn out that

    some processors get finished long before the others do, and will

    simply be sitting there spinning, soaking up cycles and research

    bucks, until the other processors complete their tasks ... this is

    known as load-imbalance, and we'll talk more about this later, but it

    should be obvious even now that it is a bad thing).

    Examples:HPF

    High Performance Fortran (HPF) is a standard in this sort of

    work

  • 8/14/2019 Evolution Computer1

    14/17

    Key principles in explicit message passing programming

    Addressability

    As one module in a distributed application, knowing what you know, and, for what you

    don't who to ask, is one of the central issues in message passing applications. "What you

    know" is the data you have resident on your own processor; what youdon't know" isanything that resides elsewhere, but you've discovered is necessary for you to find out.

    CPU can issue load/store operations involving local memory space only

    Requests for any data stored in remote processor's memory must be converted by

    programmer or run-time library into message passing calls which copy data between localmemories.

    You not only have to know that you don't know something, or that something that you

    We're now going to discuss some general issues relevant to the construction of well-

    designed distributed applications which rely on explicit message passing for data- andcontrol-communications. These principles are largely concerned with issues you should

    be focusing on as you consider the parallelization of your application:

    How is memory going to be used, and from where?

    How will the different parts of the application be coordinated?

    What kinds of operations can be done collectively?

    When should communications be blocking, and when non-blocking?

    What kinds ofsynchronization considerations need to be addressed, and when? What kinds of common problems could be encountered, and how can they be

    avoided?

    As has been mentioned before, and as will be mentioned again:

    There's no substitute for a good design ... and the worse your design, the more timeyou'll spending debugging it.

    It must be emphasized that the machine does not think for itself. It may exercise some degree ofjudgment and discrimination, but the situations in which these are required, the criteria to be

    applied, and the actions to be taken according to the criteria, have all to be foreseen in the

    program of operating instructions furnished to the machine. Use of the machine is no substitutefor thought on the basic organization of a computation, only for the labour of carrying out the

    details of the application of that thought."

    Douglass R. Hartree, Moore School lecture, Univ. of Penn., 9 July 1946

  • 8/14/2019 Evolution Computer1

    15/17

    used to know is now out-of-date and needs refreshing ... you also need to know where to

    go to get the latest version of the information you're interested in.

    No shared variables or atomic global updates (e.g. counters, loop indices)

    Synchronization is going to cost you, because there's no easy way to quickly get this kindof information to everybody ... that's just one of defining characteristics of this model of

    operation, and if its implications are too detrimental to the effectiveness of your

    application, that's a good enough reason to explore other alternatives.

    Communication and Synchronization

    The act of communicating within a distributed computing environment is very much a

    team-effort, and has implications beyond that of simply getting information fromprocessor-a to processor-b.

    On multicomputers, all interprocessor communication, including synchronization, is

    implemented by passing messages (copying data) between processorsMaking sure that everyone is using the right value of variablex is, without question, a

    very important aspect of distributed computing; but so is making sure that no one tries touse that value before the rest of the pieces are in place, a matter ofsynchronization.Given that the only point of connection among all of the processing elements in a

    distributed environment lies in the messages that are exchanged, synchronization, then,

    must also be a matter of message-passing.

    In fact, synchronization is very often seen as a separable subset of all communicationtraffic, more a matter ofcontrol information than data

    keep your synchronization requirements to the absolute minimum, and code them to be

    lean-and-mean so that as little time is taken up in synchronization (and consequently

    away from meaningful computation) as possible.

    All messages must be explicitly received (sends and receives must be paired)

    Just like the junk mail that piles up in you mailbox and obscures the really importantstuff

    (like your tax return, or the latest edition of TV-Guide), messages that are sent but neverexplicitly received are a drain on network resources.

  • 8/14/2019 Evolution Computer1

    16/17

    Grain Size

    Grain size loosely refers to the amount of

    computation that is done betweencommunication or synchronization

    ( T + S ) * equally shared load

    So S is important

    Starvation

    The amount of time a processor is

    interrupted to report its present state

    Should not be large or the processor

    will not have time to compute

    Deadlock

    A set of processes is deadlocked if each

    processes in the set hold and none will

    release until the processes have granted theother resources that they are waiting

    You can try to detect a deadlock a kill aprocess but this requires a monitoring

    system

    You can make deadlock impossible if you

    number your resources and requestingresources in ascending order.

    Flooding and Throttling

    For many parallel problem the problem is

    broken down into further parallel task

    This should not so much that you are

    unable to the number of tasks exceeds the

    number of processors if this happens theforward execution of the program fill be

    severly impaired

    Dynamic switching is a technique might be

    used to jump between the two.

    Load Balancing

    We can distribute the load by (N/P)(floor or ceiling)

    Ceiling has the advantage that one

    processor does not become the bottleneck

    Communication Bottle Necks

    Which is the bottle neck of parallel

    computation and how to remove it .

    Partitioning and Scheduling

    One of the most important tasks

    Scheduling might be static or dynamic

    Job Jar technique

  • 8/14/2019 Evolution Computer1

    17/17

    Costs of Parallel Processing

    By this point, I hope you will have gotten the joint message that:

    Parallel processing can be extremely useful, but...There Ain't No Such Thing As A Free Lunch

    Programmer's timeAs the programmer, your time is largely going to be spent doing the following:

    Analyzing code for parallelism

    The more significant parallelism you can find, not simply in the existing code, but evenmore importantly in the overall task that the code is intended to address, the more

    speedup you can expect to obtain for your efforts.

    Recoding

    Having discovered the places where you think parallelism will give results, you now have

    to put it in. This can be a very time-consuming process.

    Complicated debugging

    Debugging aparallel application is at least an order of magnitude more infuriating,because you not only have multiple instruction streams running around doing things at

    the same time, you've also got information flowing amongst them all, again all at the

    same time, and who knows!?! what's causing the errors you're seeing?

    It really is that bad. Trust me.Do whatever you can to avoid having to debug parallel code:

    consider a career change;

    hire someone else to do it;

    or write the best, self-debugging, modular and error-correcting code youpossibly can, the first time.

    If you decide to stick with it, and follow the advice in that last point, you'll find that thetime you put into writing good, well-designed code has a tremendous impact on how

    quickly you get it running correctly. Pay the price up front.

    and only for as long as you actually need them.