CMPUT429/CMPE382 Amaral 1/17/01 CMPUT429/CMPE382 Winter 2001 Topic9: Software Pipelining (Some...

Post on 14-Dec-2015

215 views 0 download

Tags:

Transcript of CMPUT429/CMPE382 Amaral 1/17/01 CMPUT429/CMPE382 Winter 2001 Topic9: Software Pipelining (Some...

CMPUT429/CMPE382Amaral1/17/01

CMPUT429/CMPE382 Winter 2001

Topic9: Software Pipelining

(Some slides from David A. Patterson’s CS252,

Spring 2001 Lecture Slides)

CMPUT429/CMPE382Amaral1/17/01

Another possibility:Software Pipelining

• Observation: if iterations from loops are independent, then we can get more ILP by scheduling execution instructions from different iterations

• Software pipelining: reorganizes loops so that each iteration is made from instructions chosen from different iterations of the original loop

Iteration 0 Iteration

1 Iteration 2 Iteration

3 Iteration 4

Software- pipelined iteration

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining ExampleBefore: Unrolled 3 times 1 L.D F0,0(R1) 2 ADD.D F4,F0,F2 3 S.D 0(R1),F4 4 L.D F6,-8(R1) 5 ADD.D F8,F6,F2 6 S.D -8(R1),F8 7 L.D F10,-16(R1) 8 ADD.D F12,F10,F2 9 S.D -16(R1),F12 10 DSUBUI R1,R1,#24 11 BNEZ R1,LOOP

After: Software Pipelined 1 S.D 0(R1),F4 ; Stores M[i] 2 ADD.D F4,F0,F2 ; Adds to

M[i-1] 3 L.D F0,-16(R1);Loads M[i-

2] 4 DSUBUI R1,R1,#8 5 BNEZ R1,LOOP

• Symbolic Loop Unrolling– Maximize result-use distance – Less code space than unrolling– Fill & drain pipe only once per loop vs. once per each unrolled iteration in loop unrolling

SW Pipeline

Loop Unrolled

ove

rlap

ped

op

sTime

Time

5 cycles per iteration

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining ExampleBefore: Unrolled 3 times 1 L.D F0,0(R1) 2 ADD.D F4,F0,F2 3 S.D 0(R1),F4 4 L.D F6,-8(R1) 5 ADD.D F8,F6,F2 6 S.D -8(R1),F8 7 L.D F10,-16(R1) 8 ADD.D F12,F10,F2 9 S.D -16(R1),F12 10 DSUBUI R1,R1,#24 11 BNEZ R1,LOOP

After: Software PipelinedL.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)

------------------------------------L: S.D 0(R1),F4 ; Stores M[i]

ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8

BNEZ R1,L------------------------------------

S.D -8(R1),F4 ADD.D F4,F0,F2

S.D -16(R1),F4

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining ExampleAfter: Software Pipelined

L.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)

------------------------------------L: S.D 0(R1),F4 ; Stores M[i]

ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8

BNEZ R1,L------------------------------------

S.D -8(R1),F4 ADD.D F4,F0,F2

S.D -16(R1),F4

F0 F2 F4

X[1000]X[999]X[998]X[997]

...

0xFF000xFEE80xFEE00xFED8

...

R1

sX[1000]

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining ExampleAfter: Software Pipelined

L.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)

------------------------------------L: S.D 0(R1),F4 ; Stores M[i]

ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8

BNEZ R1,L------------------------------------

S.D -8(R1),F4 ADD.D F4,F0,F2

S.D -16(R1),F4

X[1000]X[999]X[998]X[997]

...

0xFF000xFEE80xFEE00xFED8

...

+

R1

T1F0 F2 F4

sx[1000]

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining ExampleAfter: Software Pipelined

L.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)

------------------------------------L: S.D 0(R1),F4 ; Stores M[i]

ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8

BNEZ R1,L------------------------------------

S.D -8(R1),F4 ADD.D F4,F0,F2

S.D -16(R1),F4

X[1000]X[999]X[998]X[997]

...

0xFF000xFEE80xFEE00xFED8

...

R1

T1F0 F2 F4

sx[999]

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining ExampleAfter: Software Pipelined

L.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)

------------------------------------L: S.D 0(R1),F4 ; Stores M[i]

ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8

BNEZ R1,L------------------------------------

S.D -8(R1),F4 ADD.D F4,F0,F2

S.D -16(R1),F4

T1X[999]X[998]X[997]

...

0xFF000xFEE80xFEE00xFED8

...

R1

T1F0 F2 F4

sx[999]

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining ExampleAfter: Software Pipelined

L.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)

------------------------------------L: S.D 0(R1),F4 ; Stores M[i]

ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8

BNEZ R1,L------------------------------------

S.D -8(R1),F4 ADD.D F4,F0,F2

S.D -16(R1),F4

X[1000]X[999]X[998]X[997]

...

0xFF000xFEE80xFEE00xFED8

...

R1

T2F0 F2 F4

sx[999]

+

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining ExampleAfter: Software Pipelined

L.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)

------------------------------------L: S.D 0(R1),F4 ; Stores M[i]

ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8

BNEZ R1,L------------------------------------

S.D -8(R1),F4 ADD.D F4,F0,F2

S.D -16(R1),F4

X[1000]X[999]X[998]X[997]

...

0xFF000xFEE80xFEE00xFED8

...

R1

T2F0 F2 F4

sx[998]

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining ExampleAfter: Software Pipelined

L.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)

------------------------------------L: S.D 0(R1),F4 ; Stores M[i]

ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8

BNEZ R1,L------------------------------------

S.D -8(R1),F4 ADD.D F4,F0,F2

S.D -16(R1),F4

X[1000]X[999]X[998]X[997]

...

0xFF000xFEE80xFEE00xFED8

...

R1

T2F0 F2 F4

sx[998]

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining ExampleAfter: Software Pipelined

L.D F0,0(R1)ADD.D F4,F0,F2L.D F0,-8(R1)

------------------------------------L: S.D 0(R1),F4 ; Stores M[i]

ADD.D F4,F0,F2 ; Adds to M[i-1]L.D F0,-16(R1); Loads M[i-2]DSUBUI R1,R1,#8

BNEZ R1,L------------------------------------

S.D -8(R1),F4 ADD.D F4,F0,F2

S.D -16(R1),F4

X[1000]T2

X[998]X[997]

...

0xFF000xFEE80xFEE00xFED8

...

R1

T2F0 F2 F4

sx[998]

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

32 33 34 35 36 37 38

General Registers (Physical)

0 0116 17 18

Predicate Registers

4

LC

3

EC

x4x5

x1x2x3

Memory

39

32 33 34 35 36 37 38 39

General Registers (Logical)

0

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x132 33 34 35 36 37 38

General Registers (Physical)

0 0116 17 18

Predicate Registers

4

LC

3

EC

x4x5

x1x2x3

Memory

39

32 33 34 35 36 37 38 39

General Registers (Logical)

0

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

0 0116 17 18

Predicate Registers

4

LC

3

EC

x4x5

x1x2x3

Memory

x132 33 34 35 36 37 38

General Registers (Physical)

39

32 33 34 35 36 37 38 39

General Registers (Logical)

0

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

0 0116 17 18

Predicate Registers

4

LC

3

EC

x4x5

x1x2x3

Memory

x132 33 34 35 36 37 38

General Registers (Physical)

39

32 33 34 35 36 37 38 39

General Registers (Logical)

0

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

0 0116 17 18

Predicate Registers

4

LC

3

EC

1

x4x5

x1x2x3

Memory

x133 34 35 36 37 38 39

General Registers (Physical)

32

32 33 34 35 36 37 38 39

General Registers (Logical)

-1

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 0116 17 18

Predicate Registers

3

LC

3

EC

x4x5

x1x2x3

Memory

x133 34 35 36 37 38 39

General Registers (Physical)

32

32 33 34 35 36 37 38 39

General Registers (Logical)

-1

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 0116 17 18

Predicate Registers

3

LC

3

EC

x4x5

x1x2x3

Memory

x133 34 35 36 37 38 39

General Registers (Physical)

32

32 33 34 35 36 37 38 39

General Registers (Logical)

x2

-1

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 0116 17 18

Predicate Registers

3

LC

3

EC

x4x5

x1x2x3

Memory

x133 34 35 36 37 38 39

General Registers (Physical)

32

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1

-1

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 0116 17 18

Predicate Registers

3

LC

3

EC

x4x5

x1x2x3

Memory

x133 34 35 36 37 38 39

General Registers (Physical)

32

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1

-1

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 0116 17 18

Predicate Registers

3

LC

3

EC

x4x5

x1x2x3

Memory

x133 34 35 36 37 38 39

General Registers (Physical)

32

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1

-1

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 1116 17 18

Predicate Registers

2

LC

3

EC

1

x4x5

x1x2x3

Memory

x134 35 36 37 38 39 32

General Registers (Physical)

33

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1

-2

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 1116 17 18

Predicate Registers

2

LC

3

EC

x4x5

x1x2x3

Memory

x134 35 36 37 38 39 32

General Registers (Physical)

33

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1 x3

-2

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

y2

1 1116 17 18

Predicate Registers

2

LC

3

EC

x4x5

x1x2x3

Memory

34 35 36 37 38 39 32

General Registers (Physical)

33

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1 x3

-2

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 1116 17 18

Predicate Registers

2

LC

3

EC

x4x5

x1x2x3 y1

Memory

y234 35 36 37 38 39 32

General Registers (Physical)

33

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1 x3

-2

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 1116 17 18

Predicate Registers

2

LC

3

EC

x4x5

x1x2x3 y1

Memory

y234 35 36 37 38 39 32

General Registers (Physical)

33

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1 x3

-2

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 11

16 17 18

Predicate Registers

1

LC

3

EC

1

x4x5

x1x2x3 y1

Memory

-3

RRB

y235 36 37 38 39 32 33

General Registers (Physical)

34

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1 x3

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 1116 17 18

Predicate Registers

1

LC

3

EC

x4x5

x1x2x3 y1

Memory

-3

RRB

y2 x435 36 37 38 39 32 33

General Registers (Physical)

34

32 33 34 35 36 37 38 39

General Registers (Logical)

x2y1 x3

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 1116 17 18

Predicate Registers

1

LC

3

EC

x4x5

x1x2x3 y1

Memory

y2 x435 36 37 38 39 32 33

General Registers (Physical)

34

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 x3

-3

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1 1116 17 18

Predicate Registers

1

LC

3

EC

x4x5

x1x2x3 y1

y2

Memory

y2 x435 36 37 38 39 32 33

General Registers (Physical)

34

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 x3

-3

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

1 1116 17 18

Predicate Registers

1

LC

3

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3 y1

y2

Memory

y2 x435 36 37 38 39 32 33

General Registers (Physical)

34

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 x3

-3

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

1 1116 17 18

Predicate Registers

0

LC

3

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

1

x4x5

x1x2x3 y1

y2

Memory

-4

RRB

y2 x436 37 38 39 32 33 34

General Registers (Physical)

35

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 x3

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

1 1116 17 18

Predicate Registers

0

LC

3

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3 y1

y2

Memory

y2 x5 x436 37 38 39 32 33 34

General Registers (Physical)

35

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 x3

-4

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

1 1116 17 18

Predicate Registers

0

LC

3

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3 y1

y2

Memory

y2 x5 x436 37 38 39 32 33 34

General Registers (Physical)

35

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-4

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

1 1116 17 18

Predicate Registers

0

LC

3

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3 y1

y2y3

Memory

-4

RRB

y2 x5 x436 37 38 39 32 33 34

General Registers (Physical)

35

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

1 1116 17 18

Predicate Registers

0

LC

3

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3 y1

y2y3

Memory

y2 x5 x436 37 38 39 32 33 34

General Registers (Physical)

35

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-4

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

1 1016 17 18

Predicate Registers

0

LC

2

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

0

x4x5

x1x2x3 y1

y2y3

Memory

y2 x5 x437 38 39 32 33 34 35

General Registers (Physical)

36

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-5

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

1 1016 17 18

Predicate Registers

0

LC

2

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3 y1

y2y3

Memory

y2 x5 x437 38 39 32 33 34 35

General Registers (Physical)

36

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-5

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

1 1016 17 18

Predicate Registers

0

LC

2

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3 y1

y2y3

Memory

y2 x5 y537 38 39 32 33 34 35

General Registers (Physical)

36

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-5

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

1 1016 17 18

Predicate Registers

0

LC

2

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3

y4

y1y2y3

Memory

y2 x5 y537 38 39 32 33 34 35

General Registers (Physical)

36

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-5

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

1 1016 17 18

Predicate Registers

0

LC

2

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3

y4

y1y2y3

Memory

y2 x5 y537 38 39 32 33 34 35

General Registers (Physical)

36

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-5

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

0 1016 17 18

Predicate Registers

0

LC

1

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

0

x4x5

x1x2x3

y4

y1y2y3

Memory

y2 x5 y538 39 32 33 34 35 36

General Registers (Physical)

37

32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-6

RRB

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

0 1016 17 18

Predicate Registers

0

LC

1

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3

y4

y1y2y3

Memory

y2 x5 y5

General Registers (Physical)32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-6

RRB

38 39 32 33 34 35 36 37

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

0 1016 17 18

Predicate Registers

0

LC

1

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3

y4

y1y2y3

Memory

y2 x5 y5

General Registers (Physical)32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-6

RRB

38 39 32 33 34 35 36 37

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

0 1016 17 18

Predicate Registers

0

LC

1

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3

y4y5

y1y2y3

Memory

y2 x5 y5

General Registers (Physical)32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-6

RRB

38 39 32 33 34 35 36 37

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

0 1016 17 18

Predicate Registers

0

LC

1

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3

y4y5

y1y2y3

Memory

y2 x5 y5

General Registers (Physical)32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-6

RRB

38 39 32 33 34 35 36 37

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

0 1016 17 18

Predicate Registers

0

LC

1

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

x4x5

x1x2x3

y4y5

y1y2y3

Memory

y2 x5 y5

General Registers (Physical)32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-6

RRB

38 39 32 33 34 35 36 37

CMPUT429/CMPE382Amaral1/17/01

Software Pipelining Example in the IA-64

0 0016 17 18

Predicate Registers

0

LC

0

EC

loop:(p16) ldl r32 = [r12], 1(p17) add r34 = 1, r33(p18) stl [r13] = r35,1

br.ctop loop

0

x4x5

x1x2x3

y4y5

y1y2y3

Memory

y2 x5 y5

General Registers (Physical)32 33 34 35 36 37 38 39

General Registers (Logical)

y3y1 y4

-7

RRB

38 39 32 33 34 35 36 37