Just-In-Time Software Pipelining -...

135
Just-In-Time Software Pipelining Hongbo Rong Hyunchul Park Youfeng Wu Cheng Wang Programming Systems Lab Intel Labs, Santa Clara

Transcript of Just-In-Time Software Pipelining -...

Page 1: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Just-In-Time Software Pipelining

Hongbo Rong Hyunchul Park

Youfeng Wu Cheng Wang

Programming Systems Lab

Intel Labs, Santa Clara

Page 2: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

What is software pipelining?

2

for (i = 0; i<N; i++){a: x = y + 1 b: y = A[i] + x c: B[i+2] = B[i]*x d: A[i+2] = B[i+2]

}

A loop optimization exposing instruction-level parallelism (ILP)

Page 3: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

What is software pipelining?

2

for (i = 0; i<N; i++){a: x = y + 1 b: y = A[i] + x c: B[i+2] = B[i]*x d: A[i+2] = B[i+2]

}

a

c b

dLocal dependence

Loop-carried dependence

<2>

<1>

<1>

A loop optimization exposing instruction-level parallelism (ILP)

Page 4: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

3

a

c b

d<2>

<1>

<1>

Page 5: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

3

Iteration0 1 2 3

a

c b

d<2>

<1>

<1>

Page 6: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

3

Iteration

a b c

d

0 1 2 3

a

c b

d<2>

<1>

<1>

Page 7: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

3

Iteration

a b c

da b c

d

0 1 2 3

a

c b

d<2>

<1>

<1>

Page 8: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

3

Iteration

a b c

da b c

d

0 1 2 3

Initiation interval(II)=2

a

c b

d<2>

<1>

<1>

Page 9: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

3

Iteration

a b c

da b c

da b c

d

0 1 2 3

Initiation interval(II)=2

a

c b

d<2>

<1>

<1>

Page 10: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

3

Iteration

a b c

da b c

da b c

da b c

d

0 1 2 3

Initiation interval(II)=2

a

c b

d<2>

<1>

<1>

Page 11: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

3

Iteration

a b c

da b c

da b c

da b c

d

0 1 2 3

Initiation interval(II)=2

a

c b

d<2>

<1>

<1>

Page 12: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

3

Iteration

a b c

da b c

da b c

da b c

d

0 1 2 3

Initiation interval(II)=2

• Different iterations work on different stages in parallel

Kernel2 stages

Page 13: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

3

Iteration

a b c

da b c

da b c

da b c

d

0 1 2 3

Initiation interval(II)=2

• Different iterations work on different stages in parallel

Kernel2 stages

Page 14: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

3

Iteration

a b c

da b c

da b c

da b c

d

0 1 2 3

Initiation interval(II)=2

• Different iterations work on different stages in parallel

Kernel2 stages

Page 15: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

3

Iteration

a b c

da b c

da b c

da b c

d

0 1 2 3

Initiation interval(II)=2

• Different iterations work on different stages in parallel

• II is the performance indicator

Kernel2 stages

Page 16: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Software pipelining has been static

• Extensively studied in 3 decades, and efficient for wide-issue architectures– VLIW [Lam 1988]

– superscalar [Ruttenberg et al. 1996]

4

Page 17: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Software pipelining has been static

• Extensively studied in 3 decades, and efficient for wide-issue architectures– VLIW [Lam 1988]

– superscalar [Ruttenberg et al. 1996]

• It is seen only in static compilers

4

Page 18: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Software pipelining has been static

• Extensively studied in 3 decades, and efficient for wide-issue architectures– VLIW [Lam 1988]

– superscalar [Ruttenberg et al. 1996]

• It is seen only in static compilers

• Most works aim to minimize II but not compile overhead

4

Page 19: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

It is time now to extend software

pipelining to dynamic compilers!

5

Page 20: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

It is time now to extend software

pipelining to dynamic compilers!

• Dynamic languages are increasingly popular– JavaScript and PhP 88.9% and 81.5% in client and server websites (W3Techs)

5

Page 21: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

It is time now to extend software

pipelining to dynamic compilers!

• Dynamic languages are increasingly popular– JavaScript and PhP 88.9% and 81.5% in client and server websites (W3Techs)

• Huge amount of legacy code – Small optimization scope: a loop iteration

– Software pipelining enlarges the scope to many iterations

5

Page 22: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

It is time now to extend software

pipelining to dynamic compilers!

• Dynamic languages are increasingly popular– JavaScript and PhP 88.9% and 81.5% in client and server websites (W3Techs)

• Huge amount of legacy code – Small optimization scope: a loop iteration

– Software pipelining enlarges the scope to many iterations

• Minimizing compile overhead must be the 1st

objective– Only simple/fast algorithms can be used

5

Page 23: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

It is time now to extend software

pipelining to dynamic compilers!

• Dynamic languages are increasingly popular– JavaScript and PhP 88.9% and 81.5% in client and server websites (W3Techs)

• Huge amount of legacy code – Small optimization scope: a loop iteration

– Software pipelining enlarges the scope to many iterations

• Minimizing compile overhead must be the 1st

objective– Only simple/fast algorithms can be used

– linear-time algorithms are preferred

5

Page 24: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Challenges

• Memory aliases kill parallelism– Hardware: Atomic region + rotating alias registers [MICRO-46]

6

Page 25: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Challenges

• Memory aliases kill parallelism– Hardware: Atomic region + rotating alias registers [MICRO-46]

6

for (i = 0; i<N; i++){abcd

}

Page 26: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Challenges

• Memory aliases kill parallelism– Hardware: Atomic region + rotating alias registers [MICRO-46]

6

for (i = 0; i<N; i++){abcd

}

Original optimization scope

Page 27: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Challenges

• Memory aliases kill parallelism– Hardware: Atomic region + rotating alias registers [MICRO-46]

6

for (i = 0; i<N; i++){abcd

}

Original optimization scope

for (j = 0; j<N; j+=M){for (i = j; i<j+M; i++){

abcd

}}

Page 28: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Challenges

• Memory aliases kill parallelism– Hardware: Atomic region + rotating alias registers [MICRO-46]

6

for (i = 0; i<N; i++){abcd

}

Original optimization scope

for (j = 0; j<N; j+=M){for (i = j; i<j+M; i++){

abcd

}}

Atomic region

Page 29: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Challenges

• Memory aliases kill parallelism– Hardware: Atomic region + rotating alias registers [MICRO-46]

• Costly rollback– Software: Light-weight checkpointing

6

for (i = 0; i<N; i++){abcd

}

Original optimization scope

for (j = 0; j<N; j+=M){for (i = j; i<j+M; i++){

abcd

}}

Atomic region

Page 30: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Challenges

• Memory aliases kill parallelism– Hardware: Atomic region + rotating alias registers [MICRO-46]

• Costly rollback– Software: Light-weight checkpointing

• Scheduling is expensive

6

for (i = 0; i<N; i++){abcd

}

Original optimization scope

for (j = 0; j<N; j+=M){for (i = j; i<j+M; i++){

abcd

}}

Atomic region

Page 31: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

7

Framework(on Transmeta CMS)

Page 32: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

7

x86 binaryFramework

(on Transmeta CMS)

Page 33: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

7

x86 binary

Interpreter & profiler

Framework(on Transmeta CMS)

Page 34: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

7

Hot region optimizations

x86 binary

Interpreter & profiler

Framework(on Transmeta CMS)

Page 35: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

7

Hot region optimizations

Acyclic scheduler

x86 binary

Interpreter & profiler

Framework(on Transmeta CMS)

Page 36: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

7

Hot region optimizations

Acyclic scheduler

Assembler

x86 binary

Interpreter & profiler

Framework(on Transmeta CMS)

Page 37: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

7

Hot region optimizations

Acyclic scheduler

Assembler

x86 binary

Interpreter & profiler

Code cache

Framework(on Transmeta CMS)

Page 38: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

8

Framework(on Transmeta CMS)

Hot region optimizations

Acyclic scheduler

Assembler

x86 binary

Interpreter & profiler

Code cache

Page 39: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

8

Framework(on Transmeta CMS)

Hot region optimizations

Acyclic scheduler

Assembler

Page 40: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

8

Framework(on Transmeta CMS)

Hot region optimizations

Acyclic scheduler

Assembler

Loop selection

Acyclic scheduler

Page 41: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

8

Framework(on Transmeta CMS)

Hot region optimizations

Acyclic scheduler

Assembler

Initialization

Loop selection

Acyclic scheduler

Page 42: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

8

Framework(on Transmeta CMS)

Hot region optimizations

Acyclic scheduler

Assembler

Scheduling

Initialization

Loop selection

Acyclic scheduler

Page 43: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

8

Framework(on Transmeta CMS)

Hot region optimizations

Acyclic scheduler

Assembler

Scheduling

Rotating alias reg alloc

Initialization

Loop selection

Acyclic scheduler

Page 44: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

8

Framework(on Transmeta CMS)

Hot region optimizations

Acyclic scheduler

Assembler

Scheduling

Rotating alias reg alloc

Code generation

Initialization

Loop selection

Acyclic scheduler

Page 45: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

8

Framework(on Transmeta CMS)

Hot region optimizations

Acyclic scheduler

Assembler

Scheduling

Rotating alias reg alloc

Code generation

Initialization

Loop selection

Acyclic scheduler

Page 46: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

8

Framework(on Transmeta CMS)

Hot region optimizations

Acyclic scheduler

Assembler

Scheduling

Rotating alias reg alloc

Code generation

Initialization

Loop selection

Acyclic scheduler

Rotating alias register fileAtomic region

Page 47: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Scheduling is expensive

• NP-complete problem to find an optimal schedule [Colland et al. 1996]

9

Page 48: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Scheduling is expensive

• NP-complete problem to find an optimal schedule [Colland et al. 1996]

• O(V3) at least, exponential at worst [Rau et al. 1992]– V: number of operations

9

Page 49: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Scheduling is expensive

• NP-complete problem to find an optimal schedule [Colland et al. 1996]

• O(V3) at least, exponential at worst [Rau et al. 1992]– V: number of operations

• Can we linearize software pipelining?

9

Page 50: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Keep scheduling time under control

10

Page 51: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Keep scheduling time under control

• Schedule in linear time– Use either simple or fast sub-algorithms

• Avoid cubic or exponential complexity

10

Page 52: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Keep scheduling time under control

• Schedule in linear time– Use either simple or fast sub-algorithms

• Avoid cubic or exponential complexity

– For iterative sub-algorithms, have a threshold: the maximum #iterations allowed

10

Page 53: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Keep scheduling time under control

• Schedule in linear time– Use either simple or fast sub-algorithms

• Avoid cubic or exponential complexity

– For iterative sub-algorithms, have a threshold: the maximum #iterations allowed

• The smaller the threshold, the less the compile overhead• Once exceeded, abort software pipelining

10

Page 54: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Keep scheduling time under control

• Schedule in linear time– Use either simple or fast sub-algorithms

• Avoid cubic or exponential complexity

– For iterative sub-algorithms, have a threshold: the maximum #iterations allowed

• The smaller the threshold, the less the compile overhead• Once exceeded, abort software pipelining• Key question: How small can the threshold be?

10

Page 55: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Keep scheduling time under control

• Schedule in linear time– Use either simple or fast sub-algorithms

• Avoid cubic or exponential complexity

– For iterative sub-algorithms, have a threshold: the maximum #iterations allowed

• The smaller the threshold, the less the compile overhead• Once exceeded, abort software pipelining• Key question: How small can the threshold be?

• Find a good enough schedule– No backtracking– Priority function: approximate and never update– Separate dependence and resource constraints– Separate local and loop-carried dependences

10

Page 56: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Keep scheduling time under control

• Schedule in linear time– Use either simple or fast sub-algorithms

• Avoid cubic or exponential complexity

– For iterative sub-algorithms, have a threshold: the maximum #iterations allowed

• The smaller the threshold, the less the compile overhead• Once exceeded, abort software pipelining• Key question: How small can the threshold be?

• Find a good enough schedule– No backtracking– Priority function: approximate and never update– Separate dependence and resource constraints– Separate local and loop-carried dependences

• Iteratively improve a schedule

10

Page 57: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Just-In-Time Software Pipelining

11

Page 58: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Just-In-Time Software Pipelining• Quickly creates an initial schedule to start with

11

Prepartition

H, B

Page 59: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Just-In-Time Software Pipelining• Quickly creates an initial schedule to start with

11

Prepartition

Local scheduling• Handles local dependences and resources

H, B

Page 60: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Just-In-Time Software Pipelining• Quickly creates an initial schedule to start with

11

Prepartition

Local scheduling

Kernel expansion

• Handles local dependences and resources

• Adjusts the schedule for loop-carried dependences

H, B

Page 61: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Just-In-Time Software Pipelining• Quickly creates an initial schedule to start with

11

Prepartition

Local scheduling

Kernel expansion

• Handles local dependences and resources

• Adjusts the schedule for loop-carried dependences

H, B

Exit

Page 62: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Just-In-Time Software Pipelining• Quickly creates an initial schedule to start with

11

Prepartition

Local scheduling

Kernel expansion

Rotation

• Handles local dependences and resources

• Adjusts the schedule for loop-carried dependences

• Iteratively improves the schedule

≤3

H, B

Exit

Page 63: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Just-In-Time Software Pipelining• Quickly creates an initial schedule to start with

11

Prepartition

Local scheduling

Kernel expansion

Rotation

• Handles local dependences and resources

• Adjusts the schedule for loop-carried dependences

• Iteratively improves the schedule

• Time complexity: O(V+E)– V: #operations E: #dependences

≤3

H, B

Exit

Page 64: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Illustration12

Iteration0 1 2 3

Local dependences

Resources

Loop-carried dependences

PrepartitionPrepartitionPrepartitionPrepartition

Local scheduling

Kernel expansion

RotationExit

Page 65: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Illustration12

abc

d

RecMII

Iteration0 1 2 3

Local dependences

Resources

Loop-carried dependences

PrepartitionPrepartitionPrepartitionPrepartition

Local scheduling

Kernel expansion

RotationExit

Page 66: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Illustration12

Kernel

abc

d abc

d abc

d

RecMII

Iteration0 1 2 3

Local dependences

Resources

Loop-carried dependencesabc

d

PrepartitionPrepartitionPrepartitionPrepartition

Local scheduling

Kernel expansion

RotationExit

Page 67: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Calculating RecMII

13

• Recurrence Minimum II– II determined by the biggest dependence cycles

– Needed by almost every software pipelining method

Page 68: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Calculating RecMII

13

Conventional

O(V3)

• Recurrence Minimum II– II determined by the biggest dependence cycles

– Needed by almost every software pipelining method

Page 69: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Calculating RecMII

• Turn the problem into a Markov decision process– 1st time Howard algorithm is applied to pipelining

13

Conventional

O(V3)

• Recurrence Minimum II– II determined by the biggest dependence cycles

– Needed by almost every software pipelining method

Page 70: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Calculating RecMII

• Turn the problem into a Markov decision process– 1st time Howard algorithm is applied to pipelining

13

Conventional

O(V3)

Howard policyiteration algo.

O(exponential*E)

• Recurrence Minimum II– II determined by the biggest dependence cycles

– Needed by almost every software pipelining method

Page 71: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Calculating RecMII

• Turn the problem into a Markov decision process– 1st time Howard algorithm is applied to pipelining

• Linearize Howard with a small constant H 13

Conventional

O(V3)

Linearized Howard

O( E)

Howard policyiteration algo.

O(exponential*E) H*

• Recurrence Minimum II– II determined by the biggest dependence cycles

– Needed by almost every software pipelining method

Page 72: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Calculating RecMII

• Turn the problem into a Markov decision process– 1st time Howard algorithm is applied to pipelining

• Linearize Howard with a small constant H 13

Conventional

O(V3)

Linearized Howard

O( E)

Howard policyiteration algo.

O(exponential*E)

• Recurrence Minimum II– II determined by the biggest dependence cycles

– Needed by almost every software pipelining method

Page 73: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Calculating RecMII

• Turn the problem into a Markov decision process– 1st time Howard algorithm is applied to pipelining

• Linearize Howard with a small constant H• A by-product: critical operations13

Conventional

O(V3)

Linearized Howard

O( E)

Howard policyiteration algo.

O(exponential*E)

• Recurrence Minimum II– II determined by the biggest dependence cycles

– Needed by almost every software pipelining method

Page 74: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Divide operations into stages

14

Bellman-Ford

O(V*E)

Page 75: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Divide operations into stages

• Also an iterative sub-algorithm

14

Bellman-Ford

O(V*E)

Page 76: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Divide operations into stages

• Also an iterative sub-algorithm• Linearize it with a constant B

14

Bellman-Ford

O(V*E)

Linearized Bellman-Ford

O( E)B*

Page 77: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Divide operations into stages

• Also an iterative sub-algorithm

• Linearize it with a constant B

• Specific order in visiting edges– Scan nodes in sequential order, and visit their incoming edges

• Values are propagated along local edges in the 1st itr.

• Values are propagated along loop-carried edges in the 2nd itr.14

Bellman-Ford

O(V*E)

Linearized Bellman-Ford

O( E)B*

Page 78: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Divide operations into stages

• Also an iterative sub-algorithm

• Linearize it with a constant B

• Specific order in visiting edges– Scan nodes in sequential order, and visit their incoming edges

• Values are propagated along local edges in the 1st itr.

• Values are propagated along loop-carried edges in the 2nd itr.14

Bellman-Ford

O(V*E)

Linearized Bellman-Ford

O( E)

Page 79: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Illustration15

Kernel

abc

d abc

d abc

d

RecMII

Iteration0 1 2 3

Local dependences

Resources

Loop-carried dependencesabc

d

PrepartitionPrepartitionPrepartitionPrepartition

Local scheduling

Kernel expansion

RotationExit

Page 80: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Illustration15

Kernel

abc

d abc

d abc

d

RecMII

Iteration0 1 2 3

Local dependences

Resources

Loop-carried dependencesabc

d

PrepartitionPrepartitionPrepartitionPrepartition

Local scheduling

Kernel expansion

RotationExit

Page 81: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

16

Kernel

a

d a

d a

d

IIbc

bc

bc

Iteration0 1 2 3

a

d

bc

Illustration (Cont.)

Local dependences

Resources

Loop-carried dependences

Prepartition

Local schedulingLocal schedulingLocal schedulingLocal scheduling

Kernel expansion

RotationExit

Page 82: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

16

Kernel

a

d a

d a

d

IIb c

b c

b c

Iteration0 1 2 3

a

db c

Illustration (Cont.)

XX

Local dependences

Resources

Loop-carried dependences

Prepartition

Local schedulingLocal schedulingLocal schedulingLocal scheduling

Kernel expansion

RotationExit

Page 83: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Local scheduling

• Any local scheduling algorithm can be used– E.g. list scheduling

17

Page 84: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Local scheduling

• Any local scheduling algorithm can be used– E.g. list scheduling

• Weakness: – Loop-carried dependences may be violated

17

Page 85: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Local scheduling

• Any local scheduling algorithm can be used– E.g. list scheduling

• Weakness: – Loop-carried dependences may be violated

• To reduce the chance of violation: – Before scheduling, priority function considers loop-carried dependences in advance

– Prioritize critical operations

17

Page 86: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

18

Kernel

ab c

dab c

d ab c

d

II

Iteration0 1 2 3

ab c

dIllustration (Cont.)

XX

Local dependences

Resources

Loop-carried dependences

Prepartition

Local schedulingLocal schedulingLocal schedulingLocal scheduling

Kernel expansion

RotationExit

Page 87: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

18

Kernel

ab c

dab c

d ab c

d

II

Iteration0 1 2 3

ab c

dIllustration (Cont.)

XX

Local dependences

Resources

Loop-carried dependences

Prepartition

Local schedulingLocal schedulingLocal schedulingLocal scheduling

Kernel expansion

RotationExit

Page 88: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

d

d

d

19

ab c

ab c

ab c

II

Iteration0 1 2 3

ab c

dIllustration (Cont.)

xx

Local dependences

Resources

Loop-carried dependences

Prepartition

Local scheduling

Kernel expansionKernel expansionKernel expansionKernel expansion

RotationExit

Page 89: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

d

d

d19

Kernel

ab c

ab c

ab c

II

Iteration0 1 2 3

ab c

Illustration (Cont.)

xx

Local dependences

Resources

Loop-carried dependencesx

Prepartition

Local scheduling

Kernel expansionKernel expansionKernel expansionKernel expansion

RotationExit

Page 90: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

20

Iteration0 1 2 3

ab c

ab c

ab c

ab c

d

d

d Illustration (Cont.)

xx

Local dependences

Resources

Loop-carried dependencesx

Prepartition

Local scheduling

Kernel expansion

RotationRotationRotationRotationExit

Page 91: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

20

Kernel

Iteration0 1 2 3 ab c

ab c

ab c

ab c

d

d

d

d Illustration (Cont.)

Local dependences

Resources

Loop-carried dependences

Prepartition

Local scheduling

Kernel expansion

RotationRotationRotationRotationExit

Page 92: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Experiments

• Transmeta CMS on SPEC2k traces

21

Page 93: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Experiments

• Transmeta CMS on SPEC2k traces

• Functional simulator to comprehensively

– Explore thresholds H and B– Evaluate compile overhead and schedules’ quality

21

Page 94: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Experiments

• Transmeta CMS on SPEC2k traces

• Functional simulator to comprehensively

– Explore thresholds H and B– Evaluate compile overhead and schedules’ quality

• Cycle-accurate simulator– Simulates cache misses, latencies, …

– Initial performance study

21

Page 95: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Linearized Howard vs.

exponential backoff + binary

search [Rau et al. 1992]

22

Page 96: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Linearized Howard vs.

exponential backoff + binary

search [Rau et al. 1992]

22

0.1

1

10

100

1000

0 100 200 300 400 500 600 700

#dependences

Page 97: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Linearized Howard vs.

exponential backoff + binary

search [Rau et al. 1992]

22

0.1

1

10

100

1000

0 100 200 300 400 500 600 700

#dependences

Page 98: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Linearized Howard vs.

exponential backoff + binary

search [Rau et al. 1992]

22

0.1

1

10

100

1000

0 100 200 300 400 500 600 700

#dependences

Average 9X faster Average 9X faster Average 9X faster Average 9X faster

Peak 812X fasterPeak 812X fasterPeak 812X fasterPeak 812X faster

H≤3 for 96% of 11,992 loops 3 for 96% of 11,992 loops 3 for 96% of 11,992 loops 3 for 96% of 11,992 loops

Page 99: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Linearized Howard vs.

exponential backoff + binary

search [Rau et al. 1992]

22

0.1

1

10

100

1000

0 100 200 300 400 500 600 700

#dependences

Average 9X faster Average 9X faster Average 9X faster Average 9X faster

Peak 812X fasterPeak 812X fasterPeak 812X fasterPeak 812X faster

H≤3 for 96% of 11,992 loops 3 for 96% of 11,992 loops 3 for 96% of 11,992 loops 3 for 96% of 11,992 loops

H≤14 for all loops 14 for all loops 14 for all loops 14 for all loops

Page 100: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Linearized Bellman-Ford

23

Page 101: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Linearized Bellman-Ford

23

0%

50%

100%

2 3 4 5

% Total loops

B

Page 102: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Linearized Bellman-Ford

• B ≤ 3 for 98.8% of 11,992 loops

23

0%

50%

100%

2 3 4 5

% Total loops

B

Page 103: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Linearized Bellman-Ford

• B ≤ 3 for 98.8% of 11,992 loops

23

0%

50%

100%

2 3 4 5

% Total loops

B

Page 104: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Linearized Bellman-Ford

• B ≤ 3 for 98.8% of 11,992 loops• B ≤ 5 for all the loops

23

0%

50%

100%

2 3 4 5

% Total loops

B

Page 105: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Linearized Bellman-Ford

• B ≤ 3 for 98.8% of 11,992 loops• B ≤ 5 for all the loops• From now on, we set H=10, B=5 � 11,910 loops scheduled

23

0%

50%

100%

2 3 4 5

% Total loops

B

Page 106: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Scheduling overhead &

schedules’ quality

24

RS2 DESP JITSP RS2 DESP JITSP

Overhead % loops with optimal schedules

95%

1X

Page 107: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Scheduling overhead &

schedules’ quality

• JITSP achieves optimal schedules for 95% loops

24

RS2 DESP JITSP RS2 DESP JITSP

Overhead % loops with optimal schedules

95%

1X

Page 108: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Scheduling overhead &

schedules’ quality

• JITSP achieves optimal schedules for 95% loops

24

RS2 DESP JITSP RS2 DESP JITSP

Overhead % loops with optimal schedules

95%12X

1X

Page 109: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Scheduling overhead &

schedules’ quality

• JITSP achieves optimal schedules for 95% loops

24

RS2 DESP JITSP RS2 DESP JITSP

Overhead % loops with optimal schedules

95%13%

12X

1X

Page 110: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Scheduling overhead &

schedules’ quality

• JITSP achieves optimal schedules for 95% loops

24

RS2 DESP JITSP RS2 DESP JITSP

Overhead % loops with optimal schedules

95%13%

25%

12X

1X

Page 111: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Scheduling overhead &

schedules’ quality

• JITSP achieves optimal schedules for 95% loops

24

RS2 DESP JITSP RS2 DESP JITSP

Overhead % loops with optimal schedules

95%

23%13%

25%

12X

1X

Page 112: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Acyclic

scheduler,

27%

Software

pipelining,

33%

Hot region

optimizations,

34%

Assembler,

6%

Note: acyclic scheduler handles acyclic code, or loops NOT selected for software pipelining

25

Compile overhead distribution

Page 113: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Preliminary performance

• 40 hot loops

26

Page 114: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Preliminary performance

• 40 hot loops

26

filtered

19%

too many

registers

55%

too many

unrolls

1%

Successfully

generated

code

25%

Page 115: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Preliminary performance

• 40 hot loops

26

filtered

19%

too many

registers

55%

too many

unrolls

1%

Successfully

generated

code

25%

Page 116: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Preliminary performance

• 40 hot loops

• The architecture has bottleneck in registers

– 2/7/24/32 predicate/static alias/integer/floating point

available for pipelining 26

filtered

19%

too many

registers

55%

too many

unrolls

1%

Successfully

generated

code

25%

Page 117: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Preliminary performance

• 40 hot loops

• The architecture has bottleneck in registers

– 2/7/24/32 predicate/static alias/integer/floating point

available for pipelining 26

filtered

19%

too many

registers

55%

too many

unrolls

1%

Successfully

generated

code

25%

Page 118: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Preliminary performance

• 40 hot loops

• The architecture has bottleneck in registers

– 2/7/24/32 predicate/static alias/integer/floating point

available for pipelining 26

filtered

19%

too many

registers

55%

too many

unrolls

1%

Successfully

generated

code

25%

Page 119: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Preliminary performance (Cont.)

27

0.00

0.50

1.00

1.50

2.00

1 2 3 4 5 6 7 8 9 10

II/MII (The lower, the better)

RS2 DESP JITSP

Optimal

Page 120: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Preliminary performance (Cont.)

• JITSP achieves optimal schedules for all but 1 loops

27

0.00

0.50

1.00

1.50

2.00

1 2 3 4 5 6 7 8 9 10

II/MII (The lower, the better)

RS2 DESP JITSP

Optimal

Page 121: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Preliminary performance (Cont.)

• JITSP 10~36% speedup. Better than the others

28

-40%

-20%

0%

20%

40%

1 2 3 4 5 6 7 8 9 10

Speedup

RS2

DESP

JITSP

Page 122: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Preliminary performance (Cont.)

• JITSP 10~36% speedup. Better than the others• Exception: loop 2. Optimal schedule but slowdown due to memory aliases �retranslation needed

28

-40%

-20%

0%

20%

40%

1 2 3 4 5 6 7 8 9 10

Speedup

RS2

DESP

JITSP

Page 123: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Preliminary performance (Cont.)

• JITSP 10~36% speedup. Better than the others• Exception: loop 2. Optimal schedule but slowdown due to memory aliases �retranslation needed

• Speedup swim(5.3%), ammp(4.4%), mcf(3.6%)28

-40%

-20%

0%

20%

40%

1 2 3 4 5 6 7 8 9 10

Speedup

RS2

DESP

JITSP

Page 124: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Conclusion

29

Page 125: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Conclusion

• 1st linear software pipelining algorithm implemented for dynamic compilers

29

Page 126: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Conclusion

• 1st linear software pipelining algorithm implemented for dynamic compilers

• Turns a traditionally-expensive optimization into linear time O(V+E)

29

Page 127: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Conclusion

• 1st linear software pipelining algorithm implemented for dynamic compilers

• Turns a traditionally-expensive optimization into linear time O(V+E)– Taking advantages of results from various domains:

• hardware circuit design (Retiming)

• stochastic control (Howard algorithm)

• Graph (Bellman-Ford)

• software pipelining (Rotation scheduling and DESP)

29

Page 128: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Conclusion

• 1st linear software pipelining algorithm implemented for dynamic compilers

• Turns a traditionally-expensive optimization into linear time O(V+E)– Taking advantages of results from various domains:

• hardware circuit design (Retiming)

• stochastic control (Howard algorithm)

• Graph (Bellman-Ford)

• software pipelining (Rotation scheduling and DESP)

• Generates optimal or near-optimal schedules with reasonable compile overhead

29

Page 129: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Future work

• Register availability– Add more architecture registers

– Algorithm: Register pressure-aware

• Implementation– Loop selection

– Re-translation

• Evaluation– Benchmarks

30

Page 130: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Backup slides

31

Page 131: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Howard algorithm

32

• Each operation is rewarded to reach a cycle via 1 policy edge– The bigger the cycle, the more the reward

Page 132: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Howard algorithm

32

a

c b

d

• Each operation is rewarded to reach a cycle via 1 policy edge– The bigger the cycle, the more the reward

Page 133: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Howard algorithm

32

a

c b

d

• Each operation is rewarded to reach a cycle via 1 policy edge– The bigger the cycle, the more the reward

Page 134: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Howard algorithm

32

a

c b

d

• Each operation is rewarded to reach a cycle via 1 policy edge– The bigger the cycle, the more the reward

Page 135: Just-In-Time Software Pipelining - CGOcgo.org/cgo2014/wp-content/uploads/2013/05/JIT_Software_Pipelini… · Just-In-Time Software Pipelining • Quickly creates an initial schedule

Distribution statistics of JITSP

33