Variation Aware Application Scheduling and Power...

146
Variation Aware Application Scheduling and Power Management for Chip Multiprocessors Radu Teodorescu* and Josep Torrellas Computer Science Department University of Illinois at Urbana-Champaign http://iacoma.cs.uiuc.edu *now at Ohio State University

Transcript of Variation Aware Application Scheduling and Power...

Variation Aware Application Scheduling and Power Management for Chip Multiprocessors

Radu Teodorescu* and Josep Torrellas

Computer Science DepartmentUniversity of Illinois at Urbana-Champaign

http://iacoma.cs.uiuc.edu

*now at Ohio State University

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Technology scaling continues

transistorsize

2

number of transistors

Pentium 3

Pentium 4

Core 2 Duo

Quad Opteron

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Variation in transistor parameters

3

pdf

switching speed

leakage power

nominal

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Variation in transistor parameters

3

pdf

switching speed

leakage power

nominal

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Variation in transistor parameters

3

Frequency

Power

Reliability

pdf

switching speed

leakage power

nominal

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Variation components

4

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Variation components

4

die-to-die

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Variation components

4

die-to-die

within-die

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Variation components

4

die-to-die fast, leaky transistors

within-die

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Variation components

4

die-to-die

slower, less leaky transistors

fast, leaky transistors

within-die

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Variation components

4

die-to-die

slower, less leaky transistors

fast, leaky transistors

within-die

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C20

C2

5

Effects of within-die variation

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C20

C2

5

Effects of within-die variation

• CMPs: significant core-to-core variation in frequency and power

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• We model a 20-core CMP, 32nm

C20

C2

5

Effects of within-die variation

• CMPs: significant core-to-core variation in frequency and power

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• We model a 20-core CMP, 32nm

C20

C2

5

Effects of within-die variation

• CMPs: significant core-to-core variation in frequency and power

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• We model a 20-core CMP, 32nm

C20

C2

5

Effects of within-die variation

• CMPs: significant core-to-core variation in frequency and power

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• We model a 20-core CMP, 32nm

C20

C2

5

Effects of within-die variation

• CMPs: significant core-to-core variation in frequency and power

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• We model a 20-core CMP, 32nm

C20

C2

5

Effects of within-die variation

• CMPs: significant core-to-core variation in frequency and power

vs.

fastest

slowest

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• We model a 20-core CMP, 32nm

C20

C2

5

Effects of within-die variation

• CMPs: significant core-to-core variation in frequency and power

vs.

fastest

slowest

• On average:

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• We model a 20-core CMP, 32nm

C20

C2

5

Effects of within-die variation

• CMPs: significant core-to-core variation in frequency and power

Frequency

30%vs.

fastest

slowest

• On average:

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• We model a 20-core CMP, 32nm

C20

C2

5

Effects of within-die variation

• CMPs: significant core-to-core variation in frequency and power

Leakage power

2X

Frequency

30%vs.

fastest

slowest

• On average:

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• We model a 20-core CMP, 32nm

C20

C2

5

Effects of within-die variation

• CMPs: significant core-to-core variation in frequency and power

Total power

40%

Leakage power

2X

Frequency

30%vs.

fastest

slowest

• On average:

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• We model a 20-core CMP, 32nm

C20

C2

5

Effects of within-die variation

• CMPs: significant core-to-core variation in frequency and power

Total power

40%

Leakage power

2X

Frequency

30%vs.

fastest

slowest

Design-identical cores will have significantly different properties

• On average:

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

How can we exploit this variation?

6

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

How can we exploit this variation?

6

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• Current CMPs run at the frequency of the slowest core

slowest core

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

How can we exploit this variation?

6

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• Current CMPs run at the frequency of the slowest core

slowest core

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• We can run each core at the maximum frequency it can achieve

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

• 15% average frequency increase

How can we exploit this variation?

6

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• Current CMPs run at the frequency of the slowest core

slowest core

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• We can run each core at the maximum frequency it can achieve

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

• 15% average frequency increase

• Support present in AMD’s 4-core Opteron

How can we exploit this variation?

6

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• Current CMPs run at the frequency of the slowest core

slowest core

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• We can run each core at the maximum frequency it can achieve

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

• 15% average frequency increase

• Support present in AMD’s 4-core Opteron

How can we exploit this variation?

6

• Heterogeneous system

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• Current CMPs run at the frequency of the slowest core

slowest core

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• We can run each core at the maximum frequency it can achieve

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Contributions

7

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Contributions

• Expose variation in core frequency and power to the OS

7

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Contributions

• Expose variation in core frequency and power to the OS

• Variation-aware application scheduling algorithms

7

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Contributions

• Expose variation in core frequency and power to the OS

• Variation-aware application scheduling algorithms

• Variation-aware global power management subsystem

7

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Contributions

• Expose variation in core frequency and power to the OS

• Variation-aware application scheduling algorithms

• Variation-aware global power management subsystem

• On-line optimization algorithm that maximizes system performance at a power budget

7

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Contributions

• Expose variation in core frequency and power to the OS

• Variation-aware application scheduling algorithms

• Variation-aware global power management subsystem

• On-line optimization algorithm that maximizes system performance at a power budget

• 12-17% CMP throughput improvement at the same power

7

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Outline

8

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Outline

• Variation-aware scheduling

8

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Outline

• Variation-aware scheduling

• Variation-aware power management

• Defining the optimization problem

• Implementation

8

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Outline

• Variation-aware scheduling

• Variation-aware power management

• Defining the optimization problem

• Implementation

• Evaluation

8

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Outline

• Variation-aware scheduling

• Variation-aware power management

• Defining the optimization problem

• Implementation

• Evaluation

• Conclusions

8

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Outline

• Variation-aware scheduling

• Variation-aware power management

• Defining the optimization problem

• Implementation

• Evaluation

• Conclusions

9

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

10

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

10

Applications

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

10

Applications

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

10

ApplicationsAdditional information to guide scheduling decisions:

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

• Per core frequency and static power

10

ApplicationsAdditional information to guide scheduling decisions:

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

• Per core frequency and static power

• Application behavior

10

ApplicationsAdditional information to guide scheduling decisions:

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

• Per core frequency and static power

• Application behavior

• Dynamic power consumption

10

ApplicationsAdditional information to guide scheduling decisions:

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

• Per core frequency and static power

• Application behavior

• Dynamic power consumption

• Compute intensity (IPC)

10

ApplicationsAdditional information to guide scheduling decisions:

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

• Per core frequency and static power

• Application behavior

• Dynamic power consumption

• Compute intensity (IPC)

10

• Multiple possible goals:

ApplicationsAdditional information to guide scheduling decisions:

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

• Per core frequency and static power

• Application behavior

• Dynamic power consumption

• Compute intensity (IPC)

10

• Multiple possible goals:

• Reduce power

ApplicationsAdditional information to guide scheduling decisions:

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

• Per core frequency and static power

• Application behavior

• Dynamic power consumption

• Compute intensity (IPC)

10

• Multiple possible goals:

• Reduce power

• Improve performance

ApplicationsAdditional information to guide scheduling decisions:

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

11

When the goal is to reduce power consumption:

Applications

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

11

When the goal is to reduce power consumption:

Assign applications to low static power cores first

• VarP

Applications

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

11

When the goal is to reduce power consumption:

Assign applications to low static power cores first

• VarP

Applications

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

11

When the goal is to reduce power consumption:

Assign applications to low static power cores first

• VarP

Assign applications with high dynamic power to low static power cores

• VarP&AppP

Applications

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

11

When the goal is to reduce power consumption:

Assign applications to low static power cores first

• VarP

Assign applications with high dynamic power to low static power cores

• VarP&AppP

Applications

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

dynamic power

low high

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

11

When the goal is to reduce power consumption:

Assign applications to low static power cores first

• VarP

Assign applications with high dynamic power to low static power cores

• VarP&AppP

Applications

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

dynamic power

low high

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

11

When the goal is to reduce power consumption:

Assign applications to low static power cores first

• VarP

Assign applications with high dynamic power to low static power cores

• VarP&AppP

Applications

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

dynamic power

low high

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

12

When the goal is to improve performance:

Applications

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

12

When the goal is to improve performance:

Assign applications to high frequeny cores first

• VarF

Applications

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

12

When the goal is to improve performance:

Assign applications to high frequeny cores first

• VarF

Applications

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

12

When the goal is to improve performance:

Assign applications to high frequeny cores first

• VarF

Assign high IPC applications to high frequency cores

• VarF&AppIPC

Applications

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

12

When the goal is to improve performance:

Assign applications to high frequeny cores first

• VarF

Assign high IPC applications to high frequency cores

• VarF&AppIPC

Applications

IPC

low high

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

12

When the goal is to improve performance:

Assign applications to high frequeny cores first

• VarF

Assign high IPC applications to high frequency cores

• VarF&AppIPC

Applications

IPC

low high

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Variation-aware scheduling

12

When the goal is to improve performance:

Assign applications to high frequeny cores first

• VarF

Assign high IPC applications to high frequency cores

• VarF&AppIPC

Applications

IPC

low high

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Outline

• Variation-aware scheduling

• Variation-aware power management

• Defining the optimization problem

• Implementation

• Evaluation

• Conclusions

13

Radu Teodorescu Variation-Aware Application Scheduling and Power Management14

Variation-aware global power management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Radu Teodorescu Variation-Aware Application Scheduling and Power Management14

Variation-aware global power management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

CMP power management

Radu Teodorescu Variation-Aware Application Scheduling and Power Management14

Variation-aware global power management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

CMP power management

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

• Per core dynamic voltage and frequency scaling (DVFS)

Radu Teodorescu Variation-Aware Application Scheduling and Power Management14

Variation-aware global power management

• Challenge: find best (V,F) for each coreC1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

CMP power management

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

• Per core dynamic voltage and frequency scaling (DVFS)

Radu Teodorescu Variation-Aware Application Scheduling and Power Management14

Variation-aware global power management

• Challenge: find best (V,F) for each core

• Core-level decisions less effective in large CMPs

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

CMP power management

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

• Per core dynamic voltage and frequency scaling (DVFS)

Radu Teodorescu Variation-Aware Application Scheduling and Power Management14

Variation-aware global power management

• Challenge: find best (V,F) for each core

• Core-level decisions less effective in large CMPs

• Global (CMP-wide) power management solution is needed

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

CMP power management

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

• Per core dynamic voltage and frequency scaling (DVFS)

Radu Teodorescu Variation-Aware Application Scheduling and Power Management14

Variation-aware global power management

• Challenge: find best (V,F) for each core

• Core-level decisions less effective in large CMPs

• Global (CMP-wide) power management solution is needed

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

CMP power management

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

• Per core dynamic voltage and frequency scaling (DVFS)

Variation makes the problem more difficult

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Frequency

Tota

l pow

er

DVFS under variation

15

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Frequency

Tota

l pow

er

DVFS under variation

15

Vdd=1V

0.9V

0.8V

0.7V0.6V

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Frequency

Tota

l pow

er

DVFS under variation

15

1V

0.6V

0.85V

Vdd=1V

0.9V

0.8V

0.7V0.6V

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Frequency

Tota

l pow

er

DVFS under variation

15

1V

0.6V

0.85V

Vdd=1V

0.9V

0.8V

0.7V0.6V

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Optimization problem

16

V,F

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Optimization problem

16

V,F

Given a mapping of threads to cores (variation-aware):

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Optimization problem

16

FIND!

V,F

Given a mapping of threads to cores (variation-aware):

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Optimization problem

16

FIND!

V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

best (Vi,Fi) of each coreGiven a mapping of threads to cores (variation-aware):

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Optimization problem

16

FIND!

V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

best (Vi,Fi) of each core

• Goal: maximize system throughput (MIPS)

Given a mapping of threads to cores (variation-aware):

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Optimization problem

16

FIND!

V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

best (Vi,Fi) of each core

• Goal: maximize system throughput (MIPS)

• Constraint: keep total power below budget

Given a mapping of threads to cores (variation-aware):

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Optimization problem

16

FIND!

V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

best (Vi,Fi) of each core

• Goal: maximize system throughput (MIPS)

• Constraint: keep total power below budget

50W

75W

100W

Given a mapping of threads to cores (variation-aware):

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Optimization problem

16

FIND!

V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

best (Vi,Fi) of each core

• Goal: maximize system throughput (MIPS)

• Constraint: keep total power below budget

• Runtime system adaptation50W

75W

100W

Given a mapping of threads to cores (variation-aware):

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Optimization problem

16

FIND!

V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

best (Vi,Fi) of each core

• Goal: maximize system throughput (MIPS)

• Constraint: keep total power below budget

?

• Runtime system adaptation50W

75W

100W

Given a mapping of threads to cores (variation-aware):

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Possible solutions

17

FIND?

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Possible solutions

17

FIND?

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

• Exhaustive search: too expensive

Possible solutions

17

FIND?

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

• Simulated annealing (SAnn)

• Not practical at runtime

• Exhaustive search: too expensive

Possible solutions

17

FIND?

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

• Simulated annealing (SAnn)

• Not practical at runtime

• Linear programming (LinOpt)

• Simpler, faster

• Requires some approximations

• Exhaustive search: too expensive

Possible solutions

17

FIND?

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

• Simulated annealing (SAnn)

• Not practical at runtime

• Linear programming (LinOpt)

• Simpler, faster

• Requires some approximations

• Exhaustive search: too expensive

Possible solutions

17

FIND?LinOpt

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Outline

• Variation-aware scheduling

• Variation-aware power management

• Defining the optimization problem

• Implementation

• Evaluation

• Conclusions

18

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

LinOpt problem definition

19

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

LinOpt problem definition

• Linear programming:

19

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

LinOpt problem definition

• Linear programming:

• Maximize objective function: f(x1,...,xn), with x1,...,xn independent

19

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

LinOpt problem definition

• Linear programming:

• Maximize objective function: f(x1,...,xn), with x1,...,xn independent

• Subject to constraints such as: g(x1,...,xn) < C

19

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

LinOpt problem definition

• Linear programming:

• Maximize objective function: f(x1,...,xn), with x1,...,xn independent

• Subject to constraints such as: g(x1,...,xn) < C

• f,g are linear functions

19

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

LinOpt problem definition

• Linear programming:

• Maximize objective function: f(x1,...,xn), with x1,...,xn independent

• Subject to constraints such as: g(x1,...,xn) < C

• f,g are linear functions

• Unknowns: voltages V1,...,Vn for all cores

19

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

LinOpt problem definition

• Linear programming:

• Maximize objective function: f(x1,...,xn), with x1,...,xn independent

• Subject to constraints such as: g(x1,...,xn) < C

• f,g are linear functions

• Unknowns: voltages V1,...,Vn for all cores

• Objective function: maximize CMP throughput

19

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

LinOpt problem definition

• Linear programming:

• Maximize objective function: f(x1,...,xn), with x1,...,xn independent

• Subject to constraints such as: g(x1,...,xn) < C

• f,g are linear functions

• Unknowns: voltages V1,...,Vn for all cores

• Objective function: maximize CMP throughput

• Throughput (MIPS) = Frequency X IPC = f(V1,...,Vn)

19

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

LinOpt problem definition

• Linear programming:

• Maximize objective function: f(x1,...,xn), with x1,...,xn independent

• Subject to constraints such as: g(x1,...,xn) < C

• f,g are linear functions

• Unknowns: voltages V1,...,Vn for all cores

• Objective function: maximize CMP throughput

• Throughput (MIPS) = Frequency X IPC = f(V1,...,Vn)

• Constraint: keep power under Ptarget

19

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

LinOpt problem definition

• Linear programming:

• Maximize objective function: f(x1,...,xn), with x1,...,xn independent

• Subject to constraints such as: g(x1,...,xn) < C

• f,g are linear functions

• Unknowns: voltages V1,...,Vn for all cores

• Objective function: maximize CMP throughput

• Throughput (MIPS) = Frequency X IPC = f(V1,...,Vn)

• Constraint: keep power under Ptarget

• Power = g(V)

19

Radu Teodorescu Variation-Aware Application Scheduling and Power Management20

LinOpt implementation

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• LinOpt works together with the OS scheduler

• OS scheduler maps applications to cores (e.g. VarF&AppIPC)

• LinOpt then finds (V,F) settings for each core

20

LinOpt implementation

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• LinOpt works together with the OS scheduler

• OS scheduler maps applications to cores (e.g. VarF&AppIPC)

• LinOpt then finds (V,F) settings for each core

20

LinOpt implementation

• LinOpt runs periodically as a system process

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• LinOpt works together with the OS scheduler

• OS scheduler maps applications to cores (e.g. VarF&AppIPC)

• LinOpt then finds (V,F) settings for each core

20

LinOpt implementation

• On a core

• LinOpt runs periodically as a system process

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• LinOpt works together with the OS scheduler

• OS scheduler maps applications to cores (e.g. VarF&AppIPC)

• LinOpt then finds (V,F) settings for each core

20

LinOpt implementation

• On a core

PMU• Power management unit (PMU) - e.g., Foxton

• LinOpt runs periodically as a system process

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• LinOpt works together with the OS scheduler

• OS scheduler maps applications to cores (e.g. VarF&AppIPC)

• LinOpt then finds (V,F) settings for each core

20

LinOpt implementation

• On a core

• LinOpt uses profile information as inputPMU

• Power management unit (PMU) - e.g., Foxton

• LinOpt runs periodically as a system process

Radu Teodorescu Variation-Aware Application Scheduling and Power Management21

LinOpt implementation

Radu Teodorescu Variation-Aware Application Scheduling and Power Management21

LinOpt implementation

Post-manufacturing profiling

Each core: frequency, static power

Radu Teodorescu Variation-Aware Application Scheduling and Power Management21

LinOpt implementation

Post-manufacturing profiling

Each core: frequency, static power

Dynamic profiling

Each app: dynamic power, IPC

Radu Teodorescu Variation-Aware Application Scheduling and Power Management21

LinOpt implementation

Post-manufacturing profiling

Each core: frequency, static power

Dynamic profiling

Each app: dynamic power, IPC

LinOpt

Powertarget

Goal

Radu Teodorescu Variation-Aware Application Scheduling and Power Management21

LinOpt implementation

Post-manufacturing profiling

Each core: frequency, static power

Dynamic profiling

Each app: dynamic power, IPC

LinOpt

Powertarget

Goal

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

best (Vi,Fi) of each core

Radu Teodorescu Variation-Aware Application Scheduling and Power Management21

LinOpt implementation

Post-manufacturing profiling

Each core: frequency, static power

Dynamic profiling

Each app: dynamic power, IPC

LinOpt

Powertarget

Goal

LinOpt

10ms Time

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

best (Vi,Fi) of each core

Radu Teodorescu Variation-Aware Application Scheduling and Power Management21

LinOpt implementation

Post-manufacturing profiling

Each core: frequency, static power

Dynamic profiling

Each app: dynamic power, IPC

LinOpt

Powertarget

Goal

LinOpt

10ms Time

OS scheduling interval

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

V,F V,F V,F V,F V,F

best (Vi,Fi) of each core

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Outline

• Variation-aware scheduling

• Variation-aware power management

• Defining the optimization problem

• Implementation

• Evaluation

• Conclusions

22

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Evaluation infrastructure

• Process variation model - VARIUS [IEEE TSM’08]

• Monte Carlo simulations for 200 chips

• SESC - cycle accurate microarchitectural simulator

• SPICE model - leakage power

• Hotspot - temperature estimation

23

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Evaluation infrastructure

• 20-core CMP

• 2-issue, OOO cores

• Shared L2 cache

• 32nm technology, 4GHz

24

C1 C2 C3 C4 C5

C6 C7 C8 C9 C10

C11 C12 C13 C14 C15

C16 C17 C18 C19 C20

L2 Cache

L2 Cache

• Multiprogrammed workload:

• From a pool of SPECint and SPECfp benchmarks

Radu Teodorescu Variation-Aware Application Scheduling and Power Management25

Variation-aware scheduling

Radu Teodorescu Variation-Aware Application Scheduling and Power Management25

Variation-aware scheduling

Goal: Improve CMP throughput

Radu Teodorescu Variation-Aware Application Scheduling and Power Management25

Variation-aware scheduling

Goal: Improve CMP throughput

Naive VarF VarF&AppIPC

Radu Teodorescu Variation-Aware Application Scheduling and Power Management25

Variation-aware scheduling

2 Threads 4 Threads 8 Threads 16 Threads 20 Threads0.5

0.6

0.7

0.8

0.9

1.0

1.1

MIP

S

Goal: Improve CMP throughput

Naive VarF VarF&AppIPC

Radu Teodorescu Variation-Aware Application Scheduling and Power Management25

Variation-aware scheduling

2 Threads 4 Threads 8 Threads 16 Threads 20 Threads0.5

0.6

0.7

0.8

0.9

1.0

1.1

MIP

S

Goal: Improve CMP throughput

Naive VarF VarF&AppIPC

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

• VarF: up to 9% throughput improvement over Naive

9%7% 5% 2% 0%

25

Variation-aware scheduling

2 Threads 4 Threads 8 Threads 16 Threads 20 Threads0.5

0.6

0.7

0.8

0.9

1.0

1.1

MIP

S

Goal: Improve CMP throughput

Naive VarF VarF&AppIPC

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

• VarF: up to 9% throughput improvement over Naive

9%7% 5% 2% 0%

• VarF&AppIPC scales better with number of threads: 5-10% improvement over Naive

10%5%

8% 7% 6%

25

Variation-aware scheduling

2 Threads 4 Threads 8 Threads 16 Threads 20 Threads0.5

0.6

0.7

0.8

0.9

1.0

1.1

MIP

S

Goal: Improve CMP throughput

Naive VarF VarF&AppIPC

Radu Teodorescu Variation-Aware Application Scheduling and Power Management26

• Goal: maximize throughput

• Constraint: keep power below budget (75W)

Global power management algorithms:

Variation-aware power management

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Foxton+: baseline

26

• Goal: maximize throughput

• Constraint: keep power below budget (75W)

Global power management algorithms:

Variation-aware power management

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Foxton+: baseline

LinOpt: proposed scheme

26

• Goal: maximize throughput

• Constraint: keep power below budget (75W)

Global power management algorithms:

Variation-aware power management

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Foxton+: baseline

LinOpt: proposed scheme

SAnn: approximate upper bound

26

• Goal: maximize throughput

• Constraint: keep power below budget (75W)

Global power management algorithms:

Variation-aware power management

Radu Teodorescu Variation-Aware Application Scheduling and Power Management27

Foxton+ LinOpt SAnn

Variation-aware power management

Radu Teodorescu Variation-Aware Application Scheduling and Power Management27

4 Threads 8 Threads 16 Threads 20 Threads0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

MIP

S

Foxton+ LinOpt SAnn

Variation-aware power management

Radu Teodorescu Variation-Aware Application Scheduling and Power Management27

4 Threads 8 Threads 16 Threads 20 Threads0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

MIP

S

Foxton+ LinOpt SAnn

Variation-aware power management

Radu Teodorescu Variation-Aware Application Scheduling and Power Management27

4 Threads 8 Threads 16 Threads 20 Threads0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

MIP

S

Foxton+ LinOpt SAnn

12%17%

13% 16%

• LinOpt: 12-17% improvement over Foxton+, at the same power

Variation-aware power management

Radu Teodorescu Variation-Aware Application Scheduling and Power Management27

4 Threads 8 Threads 16 Threads 20 Threads0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

MIP

S

Foxton+ LinOpt SAnn

12%17%

13% 16%

• LinOpt: 12-17% improvement over Foxton+, at the same power

Variation-aware power management

• 30-38% reduction in ED2

Radu Teodorescu Variation-Aware Application Scheduling and Power Management27

4 Threads 8 Threads 16 Threads 20 Threads0.5

0.6

0.7

0.8

0.9

1.0

1.1

1.2

MIP

S

Foxton+ LinOpt SAnn

12%17%

13% 16%

• LinOpt: 12-17% improvement over Foxton+, at the same power

• LinOpt within 2% of SAnn

Variation-aware power management

• 30-38% reduction in ED2

Radu Teodorescu Variation-Aware Application Scheduling and Power Management28

Time overhead of LinOpt

Radu Teodorescu Variation-Aware Application Scheduling and Power Management28

Time overhead of LinOpt

50W 75W 100W

Radu Teodorescu Variation-Aware Application Scheduling and Power Management28

Time overhead of LinOpt

1 2 4 8 16 200

1.5

3.0

4.5

6.0T

ime(

mic

rose

cond

s)

Number of threads

50W 75W 100W

Radu Teodorescu Variation-Aware Application Scheduling and Power Management28

Time overhead of LinOpt

1 2 4 8 16 200

1.5

3.0

4.5

6.0T

ime(

mic

rose

cond

s)

Number of threads

50W 75W 100W

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

• Low overhead even for large problem size

28

Time overhead of LinOpt

1 2 4 8 16 200

1.5

3.0

4.5

6.0T

ime(

mic

rose

cond

s)

Number of threads

50W 75W 100W

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

• Low overhead even for large problem size

• Up to 6 µs for 20 threads

28

Time overhead of LinOpt

1 2 4 8 16 200

1.5

3.0

4.5

6.0T

ime(

mic

rose

cond

s)

Number of threads

50W 75W 100W

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

• Low overhead even for large problem size

• Up to 6 µs for 20 threads

• LinOpt runs on a core every 1-10 ms - negligible impact

28

Time overhead of LinOpt

1 2 4 8 16 200

1.5

3.0

4.5

6.0T

ime(

mic

rose

cond

s)

Number of threads

50W 75W 100W

Radu Teodorescu Variation-Aware Application Scheduling and Power Management

Conclusions

• We showed the value of exposing variation in core frequency and power to the OS

• Proposed a set of scheduling algorithms

• reduce CMP power consumption (2-16%)

• improve CMP throughput (5-10%)

• Proposed a power management algorithm

• improve CMP throughput for a given power budget (12-17%)

29

Variation Aware Application Scheduling and Power Management for Chip Multiprocessors

Radu Teodorescu* and Josep Torrellas

Computer Science DepartmentUniversity of Illinois at Urbana-Champaign

http://iacoma.cs.uiuc.edu

*now at Ohio State University