3. Comp sci Software FULL

20
SOFTWARE EXPLOITS OF INSTRUCTION-LEVEL PARALLELISM FOR SUPERCOMPUTERS 1 S. N. TAZI, 2 PRAKASH MEENA, 3 ISHITA SHARMA , 4 A. K. DUBEY& 5 NEETU SHARMA 1, 2, 3 M.Tech Scholar- Computer Engineering, Govt. Engineering College, Ajmer-305002, Rajasthan 3 M.Tech Scholar- Computer Engineering,Govt. Women Engineering College, Ajmer-305002, Rajasthan 4 Ieee Member, India 5 Govt. Engineering College Ajmer-305002, Rajasthan, India ABSTRACT For decades hardware algorithms have dominated the field of parallel processing. But with the Moore’s law reaching its limit need for software pipelining is being felt. This area has eluded researchers since long. Significant measure of success has been obtained in graphics processing using software approaches to pipelining. This project aims at developing software to detect various kinds of data dependencies like data flow dependency, anti-dependency and output dependency for a basic code block. Graphs would be generated for the various kinds of dependencies present in a code block and would be combined to obtain a single data dependency graph. This graph would be further processed to obtain a transitive closure graph and finally an ILP graph. The ILP graph can be used to predict the possible combinations of instructions that may be executed in parallel. A scheduling algorithm would be developed to obtain an instruction schedule in the form of instruction execution start times. The schedule obtained would be used to compute various performance metrics like speed-up factor, efficiency, throughput, etc. KEYWORDS: Ilp, Dependencies, System Design, Agile, Performance Metrics INTRODUCTION Instruction-level parallelism (ILP) is measuring amount of operation computer program performed simultaneously. The main objective consider by designer to design compiler and processor is, identification of ILP and gain all its beneficial points as much as possible. Commonly programs are written in a order execution model. Where all the instructions are executed one after the other explicited by the programmer. ILP facilitate both compiler & processor for overlapping to the execution of multiple instructions or change the executon order of instructions[1]. To achiving approximate standard of high performance, supercomputers uses both super-pipelining & EPIC (Explicitly Parallel Instruction Set Computing) processors. In this work is exploits software based approach from two comman approaches i.e., Hardware and software based approach. The ILP existence amont specify the application values of program. In specific field of graphics and scientific computing the existing amount of ILP is much more in compare to cryptography. The exploit ILP are used Micro-architectural techniques that include: Instruction pipelining for execution of multiple instructions which can be partially overlapped. VLIW, Superscalar execution are closely related to the concept of Parallel Instruction Computing, in which execute multiple instructions in parallel by using multiple execution units. Instructions execute in random arrangement that does not violate data dependencies in sequence of out-of-order excuetion.This technique is independent for both pipelining and superscalar. Current implementations, without proper sequencing of execution pertaining to extract ILP from ordinary programs. If etract this parallelism at compile time then, how convey appropriate information to the hardware. Every instruction of encoded multiple independent operations is clearify and sufficiently improved. The repetation process to examine again and again is followed by industry for instruction sets to control the complexity arises in squencial order instructions. International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) ISSN 2249-6831 Vol.2, Issue 4, Dec 2012 19-38 © TJPRC Pvt. Ltd.,

description

 

Transcript of 3. Comp sci Software FULL

Page 1: 3. Comp sci Software FULL

SOFTWARE EXPLOITS OF INSTRUCTION-LEVEL PARALLELISM FOR

SUPERCOMPUTERS

1S. N. TAZI, 2PRAKASH MEENA, 3ISHITA SHARMA, 4A. K. DUBEY& 5NEETU SHARMA

1, 2, 3M.Tech Scholar- Computer Engineering, Govt. Engineering College, Ajmer-305002, Rajasthan

3M.Tech Scholar- Computer Engineering,Govt. Women Engineering College, Ajmer-305002, Rajasthan

4Ieee Member, India

5Govt. Engineering College Ajmer-305002, Rajasthan, India

ABSTRACT

For decades hardware algorithms have dominated the field of parallel processing. But with the Moore’s law

reaching its limit need for software pipelining is being felt. This area has eluded researchers since long. Significant

measure of success has been obtained in graphics processing using software approaches to pipelining. This project aims at

developing software to detect various kinds of data dependencies like data flow dependency, anti-dependency and output

dependency for a basic code block. Graphs would be generated for the various kinds of dependencies present in a code

block and would be combined to obtain a single data dependency graph. This graph would be further processed to obtain a

transitive closure graph and finally an ILP graph. The ILP graph can be used to predict the possible combinations of

instructions that may be executed in parallel. A scheduling algorithm would be developed to obtain an instruction schedule

in the form of instruction execution start times. The schedule obtained would be used to compute various performance

metrics like speed-up factor, efficiency, throughput, etc.

KEYWORDS: Ilp, Dependencies, System Design, Agile, Performance Metrics

INTRODUCTION

Instruction-level parallelism (ILP) is measuring amount of operation computer program performed simultaneously.

The main objective consider by designer to design compiler and processor is, identification of ILP and gain all its beneficial

points as much as possible. Commonly programs are written in a order execution model. Where all the instructions are

executed one after the other explicited by the programmer. ILP facilitate both compiler & processor for overlapping to the

execution of multiple instructions or change the executon order of instructions[1]. To achiving approximate standard of high

performance, supercomputers uses both super-pipelining & EPIC (Explicitly Parallel Instruction Set Computing) processors.

In this work is exploits software based approach from two comman approaches i.e., Hardware and software based approach.

The ILP existence amont specify the application values of program. In specific field of graphics and scientific computing the

existing amount of ILP is much more in compare to cryptography. The exploit ILP are used Micro-architectural techniques

that include: Instruction pipelining for execution of multiple instructions which can be partially overlapped. VLIW,

Superscalar execution are closely related to the concept of Parallel Instruction Computing, in which execute multiple

instructions in parallel by using multiple execution units. Instructions execute in random arrangement that does not violate

data dependencies in sequence of out-of-order excuetion.This technique is independent for both pipelining and superscalar.

Current implementations, without proper sequencing of execution pertaining to extract ILP from ordinary programs. If etract

this parallelism at compile time then, how convey appropriate information to the hardware. Every instruction of encoded

multiple independent operations is clearify and sufficiently improved. The repetation process to examine again and again is

followed by industry for instruction sets to control the complexity arises in squencial order instructions.

International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) ISSN 2249-6831 Vol.2, Issue 4, Dec 2012 19-38 © TJPRC Pvt. Ltd.,

Page 2: 3. Comp sci Software FULL

20 S. N. Tazi, Prakash Meena, Ishita Sharma, A. K. Dubey, Neetu Sharma

A technique used for renaming register is to turn away accidental serialization of program operations which is

imposed by reuse of registers and those particular operations. All the internal part of speculative execution are executed

before the determination of target control flow instructions . Branch prediction, which is used with speculative execution to

turn away stalling for control dependencies which may resolved. [1]

Figure 1.1: A Canonical Five-Stage Pipeline in a RISC Machine (IF = Instruction Fetch, ID = Instruction Decode,

EX = Execute, MEM = Memory Access, WB = Register Write Back) [2]

DEPENDENCIES

In computer science, data dependency shows instruction or a program statement which, refers to the data of

preceding statement. According to compiler theory, dependence analysis is a technique, use for discovering data

dependencies from statements (or instructions). Two common type of dependencies are as follow:

DATA DEPENENCIES

Let’s assume statement S1 and S2, S2 depends on S1 if:

[I (S1) ∩ O (S2)] U [O (S1) ∩ I (S2)] U [O (S1) ∩ O (S2)] ≠ Ø

where,

I( Si ) represent set of memory locations read by Si and

O( Sj ) represent set of memory locations written by Sj

And S1 to S2 represent the feasible run-time execution path

This condition is called Bernstein Condition, named after A.J. Bernstein.

Three cases exist:

• True (data) dependence: O(S1) ∩ I(S2), S1 -> S2 and S1 writes something read by S2

• Anti-dependence: I(S1) ∩ O(S2), mirror relationship of true dependence

• Output dependence: O(S1) ∩ O(S2), S1 -> S2, both are writing into same memory location.

True Dependencies

A true dependencie, also known as data dependencies. It occurs when the current instructions depends on the previous

instruction’s results.

Page 3: 3. Comp sci Software FULL

Software Exploits of Instruction-Level Parallelism for Supercomputers 21

Anti-dependence

Anti-dependencie occurs when required value for a particular instructions, updated later.

Output Dependencies

An output dependencie occurs when the final output value of a variable is affected by instruction’s order.

A commonly used convention for the data dependencies is the following:

• Read-after-Write (true dependence)

• Write-after-Write (output dependence)

• Write-after-Read (anti-dependency)

CONTROL DEPENDENCY

In the program instruction are executed according to sequencial execution modal, under this modal, instruction used

one after to other, atomically. However, dependencies among instructions may execute parallel execution of multiple

instructions, by a processor exploiting instruction level parallelism without considering related dependencies may cause

danger of getting wrong results, namely hazards.

We restrict ourselves to data dependencies in this project without dealing of control dependencies.[2]

REQUIREMENT ELICITATION

Basic

The requirement elicitation of system campture all the relevant information related to the system development, i.e.,

customerdetails, problem identification of client & appropriate developer for particular problem.

The requirement elicitation role work as interface between system specification (in developer team) and

custmer’s records (problem). The main motive is to focus on the custmor’s view of the system.[3][4]

In the analysis phase of requirement, analysier mainly focus on two basic thing: clarifcation & understandibilty of

the real problem is one thing and procedure to solve the cpcoming problem is another one. The automation of system and

automation in development environment could a common problem another one make the combination of these two.

Heavy systems have a lot of features, and it’s necessary to perform all these different tasks, one of the most commn

task is to understood the requirements of the system.. The problem analysier, analysis real mean of problem and it’s context.

They required the complete report generated by previous analyzer to understood the system and its individual automated

parts.

Proposed System

This project aims at developing a software to detect various kinds of data dependencies like data flow dependency,

anti-dependency and output dependency for a basic code block.

Page 4: 3. Comp sci Software FULL

22 S. N. Tazi, Prakash Meena, Ishita Sharma, A. K. Dubey, Neetu Sharma

Figure 1.2: ILP Application Algorithm

Graphs would be generated for the various kinds of dependencies present in a code block and would be combined

to obtain a single data dependency graph. This graph would further be processed using certain backtracking algorithms to

obtain a transitive closure graph (TCG). The TCG is an indication of the various kinds of dependencies and can be used to

predict the possible combinations of instructions that may execute in parallel. Finally an ILP graph would be obtained. A

scheduling algorithm would be applied to obtain an instruction schedule in the form of instruction start times. Certain

performance metrics would then be computed.

Specificaton of Software & Hardware

Processor: Intel Core2 Duo @ 2.66GHz

RAM: 2GB DDR2

Hard Disk: Samsung HD 161(160 GB)

Operating system: Fedora 10 (Linux kernel version 2.6.27.5-117.fc10.i686)

Detect data flow dependency

Detect anti-dependency

Detect output dependency

Obtain data dependency graph

Obtain transitive closure graph

Obtain architectural restrictions graph

Obtain dependence graph

Obtain ILP graph

Obtain instruction schedule

Compute performance metrics

(Instructions without ILP)

Page 5: 3. Comp sci Software FULL

Software Exploits of Instruction-Level Parallelism for Supercomputers 23

X-Windows system: GNOME

Editor: Gedit

Development kit: JDK 1.6

Programming paradigm: Object Oriented

Programming language: JAVA 2 SE

Development philosophy: Agile

Process model: Scrum

Technology: Open Source

Image manipulator: GIMP

SYSTEM DESIGN

The objective of analysis modeling is to create a variety of representations that depict software requirements for

information, function, and behavior. To accomplish this, two different modeling philosophies can be applied: structured

analysis and object-oriented analysis. Structured analysis views software as an information transformer.That may support

software engineer to identify data object and relationship between different object.It also transform the object in a flow

through a systematic manner by the use of function. Object-oriented analysis examines a problem domain defined as a set of

use-cases in an effort to extract classes that define the problem. Each class has a set of attributes and operations. Classes are

related to one another in a number of ways and are modeled using UML diagrams. Four modeling element such as :scenario-

base, class-base, flow and behavioral models are composed with analysis .

Scenario-Based Modeling

This model consist software requirements on the basis of user’s view. The use-case- a narrative or template –driven

description of an interaction between an actor and the software- is the primary modeling element. Derived during

requirement elicitation, the use-case defines the key steps for a specific function or interaction. The degree of use-case

formality and details varies, but the end result provides necessary input to all other analysis modeling activities. Scenarios

can also be described using an activity diagram- a flowchart-like graphical representation that depicts the processing flow

within a specific scenario.

Figure 2: Use-Case Diagram

Page 6: 3. Comp sci Software FULL

24 S. N. Tazi, Prakash Meena, Ishita Sharma, A. K. Dubey, Neetu Sharma

A use-case captures the interactions that occur between producers and consumers of information and the system

itself. Requirements gathering mechanisms are used to identify stakeholders, define the scope of the problem, specify overall

operational goals, outline all known functional requirements, and describe the objects that will be manipulated by the

system.

Flow Modeling

Flow models focus on the flow of data objects as they are transformed by processing functions. Derived from

structured analysis, flow models use the data flow diagram, a modeling notation that depicts how input is transformed into

output as data objects move through a system.

Figure 3: Context-Level DFD

Each software function that transforms data is described by a process specification or narrative. In addition to data

flow, this modeling element also depicts control flow- a representation that illustrates how events affect the behavior of a

system. The DFD takes an input-process-output view of a system. That is, data objects flow into the software, are

transformed by processing elements, and resultant data objects flow out of the software.

Figure 4: Level 1 DFD

Figure 5: Level 2 DFD that Refines the Detect Direct Dependency Process

Page 7: 3. Comp sci Software FULL

Software Exploits of Instruction-Level Parallelism for Supercomputers 25

Class-Based Modeling

Figure 6: Class Diagram

Behavioral Modeling

Data objects are represented by labeled arrows and transformations are represented by bubbles. The DFD is

presented in a hierarchical fashion. That is, the first data flow model (sometimes called a level 0 DFD or context diagram)

represents the system as a whole. Subsequent data flow diagrams refine the context diagram, providing increasing detail with

each subsequent level.

Figure 7: Sequence Diagram

SYSTEM ANALYSIS

Agile Design Philosophy

Agile is a philosophy ,guidelines to build a software . This philosophy encourages client satisfaction and delivery of

a software before deadline;It gave a motivation to development team and minimize the workdone on software product.

These guidline hammering both analyser and developer for better communication to client.

Page 8: 3. Comp sci Software FULL

26 S. N. Tazi, Prakash Meena, Ishita Sharma, A. K. Dubey, Neetu Sharma

Manifesto for agile software development:

We unveil improve the developing process of software by doing it and also assist others do it. Entire this processing

work we have come to significance:

$ Individuals and interactions accomplished processes and tools

$ Working software over comprehensive documentation

$ Customer collaboration over contract negotiation

$ Responding to changes aloft following a plan

i.e, while there is a value item in the right, we changes and make value the items on the left more. [3][4]

Software engineers and other project stakeholders work together on an agile team- a team that is self-organizing

and in control of its own destiny.

Figure 8: Agile v/s waterfall

An agile team fosters communication and collaboration among all who serve on it. Agile development may be best

termed as “software engineering lite.” The basic framework activities- customer communication, planning, modeling,

construction, delivery and evaluation remain. But they morph into a minimal task set that pushes the project team toward

construction and delivery (some argue that this is done at the expense of problem analysis and solution design). Customers

and software engineers who have adopted the agile philosophy have the same view- the only really important work product

is an operational “software increment” that is delivered to the customer on the appropriate commitment date.

The Agile Alliance defines 12 principles for those who want to achieve agility [5]:

1. Our highest priority is to made satisfaction to customer during whole phases from initiation to delivery

continuously of valuable software.

2. Adapt required changes in the requirements, flush later in development. Agile processes tackle changes made

for customer’s competitive advantage.

3. To gave the preference for shorter time scaling relate to deliver process of software. It may from small time

duration (couple of week) to continuous increment in time(couple of month).

4. During the project development process developers and business people must work together daily.

5. Develop the projects near about indviduals motivation. Provide appropriate environment and support

according to their need, and belief them to achive the job done.

Page 9: 3. Comp sci Software FULL

Software Exploits of Instruction-Level Parallelism for Supercomputers 27

6. Face-to-face conversation is the most common and effective with efficient method to fetch information for

both developers team member and other.

7. Primary measure of progress is covered by working software.

8. Agile processes promote credible development. All the involving affective teams for project (i.e; sponsors,

customers and developers) should be keep-up a continuous fix pace indefinitely.

9. Continuous concentration to technical preeminence and good design embellish agility.

10. Lack of adornment- cover the amount of essential work which is not completed.

11. The self-organizing teams emerge best architectures, requirements and designs.

12. Time to time the team reflects regularly on increment of effectiveness and maintain their behavioral tune

accordingly.

Agility can be applied to any software process. However, to accomplish this, it is essential that the process be

designed in a way that allows the project team to adapt tasks and to streamline them, conduct planning in a way that

understands the fluidity of an agile development approach, eliminate all but the most essential work products and keep them

lean, and emphasize an incremental delivery strategy that gets working software to the customer as rapidly as feasible for the

product type and operational environment. [10]

Any agile software process is characterized in a manner that addresses three key assumptions about the majority of

software projects [6][7][11]:

• It is difficult to predict in advance which software re1quirements will persist and which will change. It is

equally difficult to predict how customer priorities will change as a project proceeds.

• For many types of software, design and construction are interleaved. That is, both activities should be performed

in tandem so that design models are proven as they are created. It is difficult to predict how much design is

necessary before construction is used to prove the design.

• Analysis, design, construction, and testing are not as predictable (from a planning point of view) as we might

like.

A number of key traits must exist among the people on agile team [8][9]:

• Common focus

• Competence

• Collaboration

• Decision-making ability

• Fuzzy problem-solving ability

• Mutual trust and respect

• Self-organization

Page 10: 3. Comp sci Software FULL

28 S. N. Tazi, Prakash Meena, Ishita Sharma, A. K. Dubey, Neetu Sharma

Scrum Process Model[13]

Scrum is an agile process model that was developed by Jeff Sutherland and his team in the early 1990s. In recent

years, further development of the Scrum methods has been performed by Schwaber and Beedle.The Scrum principle consist

with the agile manifesto.

Figure 9: Scrum[13]

Scrum emphasizes the use of a set of “software process patterns” that have proven effective for projects with tight

timelines, changing requirements, and business criticality. Each of these process patterns defines a set of development

activities:

Backlog : Choose the maximum priority from list of requierment, according to the business value. Items can be added to the

backlog at any time (this is how changes are introduced). The project manager assesses the backlog and updates the

priorities as required.

Figure 10: Prevalence of Scrum[13]

Page 11: 3. Comp sci Software FULL

Software Exploits of Instruction-Level Parallelism for Supercomputers 29

Sprints-By the getting of priority requirment from backlog to fit work unit , that must completed the task within predefined

deadline . During the sprint, the backlog items that the sprint work units address are frozen (i.e. changes are not introduced

during the sprint). Hence, the sprint allows the team members to work in a short-term, but stable environment.

Scrum meetings- are short meetings held daily by the Scrum teams. Three key questions are asked and answered by all

team members:

What did you observe since the last meeting?

What obstacles are you encountering?

What do you plan to accomplish by the next team meeting?

Demos- deliver the software increment to the customer so that functionality that has been implemented can be

demonstrated and evaluated by the customer. It is important to note that the demo may not contain all planned

functionality, but rather those functions that can be delivered within the time-box that was established.

CONSTRUCTION

Base

The programming paradigm used for coding is object oriented. It provides the ease of development with the use of

constructs like classes, constructors, inheritance, interface, encapsulation, and packages.

Java provides a rich set of language features like pre-defined classes and methods in the form of packages,

interfaces for establishing guidelines for methods, data hiding, event handling with awt, etc.

The user interface for this software has been designed using Swing. It provides light weight components as

compared to the awt. The use of awt in this software is restricted to event handling.

Java provides this software its present platform independent form. The security is ensured by the sandbox model

of JVM. Packages most prominently used in the development of this software include javax.swing, java.awt, java.util,

java.awt.geom, and java.awt.event.

The interface used in this software include ActionListener and Runnable interface.

Transitive Closure Graph

It is the summation of both the direct and indirect dependencies. Given that G is a n-vertex digraph, we construct

the transitive closure graph of the digraph G as another n-vertex digraph by adding edges to G, following this rule. In H,

add an edge (i, j) directed from vertex i to j if, and only if, there is a directed path (of any length -1,2,3,…,n-1) from ‘i’ to

‘j’ in G. To estimate the transitive closure of G in Θ (n3) time that saves time and space in practice we substitute logical

operations V (logical OR) and Λ (logical AND) for the arithmetic operations min and + in the Floyd-Warshall algorithm.

For i, j, k = 1,2,3,….n

We construct the transitive closure according to Floyd-Warshall algorithm[12], G* = (V, E*) by putting edge (i,

j) into E* if and only if tij(n) = 1.

Page 12: 3. Comp sci Software FULL

30 S. N. Tazi, Prakash Meena, Ishita Sharma, A. K. Dubey, Neetu Sharma

if tij(n) = 1.

tij(0) = { 0 if i ≠ j and (i, j) E, 1

if i = j or (i, j) E, and

for k >= 1,

tij(k) = tij

(k-1) (tik(k-1) tkj

(k-1) ).

Transitive-Closure (G)

1 n │V [G] │

2 for i ← 1 to n

3 do for j ← 1 to n

4 do if i = j or (i, j) E[G]

5 then tij(0) ← 1

6 else tij(0) ← 0

7 for k ← 1 to n

8 do for i ← 1 to n

9 do for j ← 1 to n

10 do tij(k) ← tij(k-1) (tik(k-1) tkj(k-1))

11 return T(n)

Scheduling Algorithm

Schedule (T, Index)

1 unscheduled_count := index

2 initialize inst_state to 0

3 initialize pipeline_stage to 0

4 while unscheduled_count > 0

5 do if stage = EMPTY

6 then sel_stage ← stage

7 for j ← 1 to index

8 if inst_state = UNPROCESSED

9 then while dependency or unprocessed predecessor exists

10 if sched_condition

11 break

Page 13: 3. Comp sci Software FULL

Software Exploits of Instruction-Level Parallelism for Supercomputers 31

12 else stage = OCCUPIED

13 sel_stage = stage_no

14 inst_state : = PROCESSED

15 time_array[index] := clock

16 update stage counters and clock

17 return time_array

• T is the ILP graph

• time_array is an array that stores the execution start times of instructions

• sel_stage represents the pipeline stage to which an instruction has been supplied

• index denotes the total number of instructions

• inst_state denotes whether instruction has been scheduled or not

stage denotes whether a pipeline stage is empty or occupied.

Instruction Set

This software operates on a basic code block written in a generic instruction set. All instructions are assumed to

be of five clock cycles.

Transfer Instructions Like

MOV RD , RS

MVI R, 8-BIT

OUT [ADDRESS]

IN [ADDRESS]

Arithmetic Instructions Like

ADD R

ADI 8-BIT

SUB R

SUI 8-BIT

INR R

DCR R

Logic Instructions Like

ANA R

ANI 8-BIT

ORA R

Page 14: 3. Comp sci Software FULL

32 S. N. Tazi, Prakash Meena, Ishita Sharma, A. K. Dubey, Neetu Sharma

ORI 8-BIT

XRA R

XRI 8-BIT

Machine Control Instructions Like

HLT

NOP

Notes:

• The implicit register are used in accumulator .

• Instructions like INR are presumed to both use and modify the associated register.

• Branch instructions like the JMP have not been scheduled because we have not dealt with the control

dependencies at this stage of the project.

• Being a fundamental law of computer science GIGO is also applicable here. This software has no explicit error

handling facility.

Architectural Restrictions

Some processors may have some restrictions on which instructions can be combined in parallel. Architectural

restrictions may be represented by an architectural restrictions graph, which depicts which instructions cannot be combined

in parallel. We have considered the following architectural restrictions in this software:

• ADD – MOV

• ADDF – MULF

• SUBF – DIVF

• SUB –MOV

• INR – DIV

Performance Metrics

For a K-stage linear pipeline processor with clock period τ :

Page 15: 3. Comp sci Software FULL

Software Exploits of Instruction-Level Parallelism for Supercomputers 33

Testing

This software has been tested using a modular testing approach. Finally the integrated product has been tested as a

single unit and the detected flaws have been removed.

It has been tested on both the Linux and windows platforms for consistent performance and absence of errors of

any sort.

Let us consider a test case to understand the working of the software. Code sequence are given as below:

ADDF R1 R2 R3

SUB R4 R2 R1

MOV R2 PORT#1

INR R4

DCR R1

ORA R2

DIV R7 R5 R3

MULF R6 R8 R9

The code consists of a block of eight instructions. The instructions may be defined as:

ADDF – floating-point add the contents of R2 and R3 and store in R1

SUB – subtract the contents of R1 from R2 and store the result in R4

MOV – move the data from port#1 to R2

INR – increment register R4

DCR – decrement register R1

ORA – perform an OR operation over the contents of R2 and accumulator

DIV – divide R5 by R3 and store result in R7

MULF – floating-point multiply R8 and R9 and store result in R6

Figure 11: Data Flow Dependency Graph Figure 12: Anti-Dependency Graph

Page 16: 3. Comp sci Software FULL

34 S. N. Tazi, Prakash Meena, Ishita Sharma, A. K. Dubey, Neetu Sharma

Figure 13: Output Dependency Graph Figure 14: Data Dependency Graph

Figure 15: Transitive Closure Graph Figure 16: Architectural Restrictions Graph

Figure 17: Dependence Graph Figure 18: ILP Graph

Page 17: 3. Comp sci Software FULL

Software Exploits of Instruction-Level Parallelism for Supercomputers 35

Figure 19: Performance Metrics

DEPLOYMENT

System Implementation

When the theoretical design concept is turned out into a working system, then ths stage is known as implementation

of the project. Therefore, it considered as most danger stage in achieving a successful newly system and in giving the user,

confidence that the newly system will work proper and be effective. The implementation stage involves investigation of the

existing system, careful planning and implementation constraints, methods of design to manage conversion and judgment of

conversion methods. Though the software has been developed on the Linux platform but it has been implemented on the

windows platform as well. The platform independent nature of the software is due the platform independence of Java. The

platform has been tested on both the platforms for consistent performance. The final working software has been packaged by

assembling all the required class files in a jar file archive. The delivered software provides benefit for the end-user, but it

also provides useful feedback for the software team. An appropriate statement is given by the end user to increase the

characteristics of software such as reliability, user friendly and other comments to their functions and feature. Feedback

should be collected and recorded by the software team and used to:

$ Make immediate modifications to the delivered increment (if required)

$ Define changes to be incorporated into the next planned increment

$ Make necessary design modifications to accommodate changes

$ Revise the plan for the next increment to reflect the changes

CONCLUSIONS AND FUTURE SCOPE

Conclusions

For decades hardware algorithms like Tomasulo algorithm for the IBM System/360’s FPU, Scoreboarding for the

CDC 6600 computer, etc. have dominated the scenario of pipelining in processors. But with the Moore’s law reaching its

limit, it is no longer feasible to depend purely on hardware pipelining. A paradigm shift is expected in the nearby future from

the hardware-centric approaches to a software-oriented approach to exploit the instruction level parallelism. Intel, IBM,

AMD and other companies have already begun intense research in this field. An area where this approach has found

significant application is graphics processing, as the graphics data contains a considerable amount of redundancy and

Page 18: 3. Comp sci Software FULL

36 S. N. Tazi, Prakash Meena, Ishita Sharma, A. K. Dubey, Neetu Sharma

parallelism. A prominent example is the Graphics Processing Unit (GPU) technology which relies heavily on software

approaches to pipelining.

Another example is the Itanium processor developed by the Intel Corporation. This processor has found a very

significant application as the processor for the Intel supercomputer at NASA. Itanium has features like software pipelining

for loop optimization, rotating registers, speculative branch prediction, etc. This is a field of intense research and provides

ample of opportunities for the developers and scientists. This field also presents significant challenges for the system

programmers.

Future Scope

The project has covered almost all the requirements initially laid out. Further requirements and improvements can

be easily incorporated since the coding is mainly modular in nature. The agile nature of the project has provided the scope

for easy accommodation of changes and emerging requirements. Some of the extensions may be in the form of:

$ GCD tests before computing dependencies

$ Use of expanded instruction set

$ Inclusion of control dependencies to extend the software functionality for handling complex branching code blocks

$ Application of global code scheduling algorithms like Trace scheduling

$ Refinement of the scheduling algorithm to handle resource dependencies

REFERENCES

1. Yahoo answer on “Hardware and Software approaches for instruction Level parallesism “ By Sumanta .in 2011

2. John L. Hennessy, David A. Patterson (2003),” Computer Architecture: A Quantitative Approach” (3rd

ed.), Morgan Kaufmann. ISBN 1-55860-724-2.

3. Beck, Kent; et al. (2001). "Manifesto for Agile Software Development". Agile Alliance. Retrieved 14 June 2010.

4. Ambler, S.W. "Examining the Agile Manifesto". Retrieved 6 April 2011.

5. Beck, Kent, et al, "Principles behind the Agile Manifesto", Agile Alliance, Archivedfrom the original on 14 June

2010, Retrieved 6 June 2010.

6. Black S. E. , Boca P. P. , Bowen J. P., Gorman J., Hinchey M. G. , "Formal versus agile:- Survival of the

fittest", IEEE Computer 49 (9): 39–45, September 2009.

7. Boehm, B.R. Turner , “Balancing Agility and Discipline:- A Guide for the Perplexed”, Boston, MA, Addison-

Wesley ISBN 0-321-18612-5, Appendix A, pages 165-194.

8. Mark Seuffert, Piratson Technologies, Sweden, "Karlskrona test, A generic agile adoption test", Piratson.se.

Retrieved 6 June 2010.

9. "How agile are you, a scrum-specific test", Agile-software-development.com, Retrieved 6 June 2010.

10. http://www.cloudspace.com/blog/2010/08/25/agile-principle-11-the-best-architectures-requirements-and-designs-

emerge-from-self-organizing-teams/ Posted on August 25, 2010 by Tim Rosenblatt.

11. “Software Engineering:- A Practitioner’s Approach”, by Roger S. Pressman, chapter 04.

Page 19: 3. Comp sci Software FULL

Software Exploits of Instruction-Level Parallelism for Supercomputers 37

12. http://serverbob.3x.ro/IA/DDU0157.html By The Floyd-Warshall algorithm

13. “Agile Software Development with Scrum”, by Ken Schwaber and Mike Beedle.

14. Paolo Faraboschi, Joseph A. Fisher and Cliff Young, “Instruction Scheduling for Instruction Level Parallel

Processors”, IEEE Proceedings , VOL 89, No. 11, November 2001.

15. Rainer Leupers ,“Exploiting Conditional Instructions in Code Generation for Embedded VLIW Processors”

16. Alexandru Nicolau and Joseph A. Fisher, “Measuring the Parallelism Available for Very Long Instruction Word

Architectures”, IEEE transactions on computers, VOL c-33, No. 11, November 1984

17. Lei Wang and Gui Chen “Architecture-dependent Register allocation and Instruction Scheduling on VLIW”, 2010

IEEE.

18. “Advanced computer architecture: Parallelism, Scalability, Programmability”, by Kai Hwang

19. “Computer architecture and Parallel processing”, by Faye A. Briggs and Kai Hwang.

Page 20: 3. Comp sci Software FULL