Language and Compiler Support for Auto-Tuning Variable...

78
Language and Compiler Support for Auto-Tuning Variable-Accuracy Algorithms Jason Ansel Yee Lok Wong Cy Chan Marek Olszewski Alan Edelman Saman Amarasinghe MIT - CSAIL April 4, 2011 Jason Ansel (MIT) PetaBricks April 4, 2011 1 / 30

Transcript of Language and Compiler Support for Auto-Tuning Variable...

Page 1: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Language and Compiler Support for Auto-TuningVariable-Accuracy Algorithms

Jason Ansel Yee Lok Wong Cy Chan Marek OlszewskiAlan Edelman Saman Amarasinghe

MIT - CSAIL

April 4, 2011

Jason Ansel (MIT) PetaBricks April 4, 2011 1 / 30

Page 2: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Outline

1 Motivating Example

2 PetaBricks Language Overview

3 Variable Accuracy

4 Autotuner

5 Results

6 Conclusions

Jason Ansel (MIT) PetaBricks April 4, 2011 2 / 30

Page 3: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

A motivating example

How would you write a fast sorting algorithm?

Insertion sortQuick sortMerge sortRadix sortBinary tree sort, Bitonic sort, Bubble sort, Bucket sort, Burstsort,Cocktail sort, Comb sort, Counting Sort, Distribution sort, Flashsort,Heapsort, Introsort, Library sort, Odd-even sort, Postman sort,Samplesort, Selection sort, Shell sort, Stooge sort, Strand sort,Timsort?

Poly-algorithms

Jason Ansel (MIT) PetaBricks April 4, 2011 3 / 30

Page 4: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

A motivating example

How would you write a fast sorting algorithm?

Insertion sortQuick sortMerge sortRadix sort

Binary tree sort, Bitonic sort, Bubble sort, Bucket sort, Burstsort,Cocktail sort, Comb sort, Counting Sort, Distribution sort, Flashsort,Heapsort, Introsort, Library sort, Odd-even sort, Postman sort,Samplesort, Selection sort, Shell sort, Stooge sort, Strand sort,Timsort?

Poly-algorithms

Jason Ansel (MIT) PetaBricks April 4, 2011 3 / 30

Page 5: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

A motivating example

How would you write a fast sorting algorithm?

Insertion sortQuick sortMerge sortRadix sortBinary tree sort, Bitonic sort, Bubble sort, Bucket sort, Burstsort,Cocktail sort, Comb sort, Counting Sort, Distribution sort, Flashsort,Heapsort, Introsort, Library sort, Odd-even sort, Postman sort,Samplesort, Selection sort, Shell sort, Stooge sort, Strand sort,Timsort?

Poly-algorithms

Jason Ansel (MIT) PetaBricks April 4, 2011 3 / 30

Page 6: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

A motivating example

How would you write a fast sorting algorithm?

Insertion sortQuick sortMerge sortRadix sortBinary tree sort, Bitonic sort, Bubble sort, Bucket sort, Burstsort,Cocktail sort, Comb sort, Counting Sort, Distribution sort, Flashsort,Heapsort, Introsort, Library sort, Odd-even sort, Postman sort,Samplesort, Selection sort, Shell sort, Stooge sort, Strand sort,Timsort?

Poly-algorithms

Jason Ansel (MIT) PetaBricks April 4, 2011 3 / 30

Page 7: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

std::stable sort

/usr/include/c++/4.5.2/bits/stl algo.h lines 3350-3367

Jason Ansel (MIT) PetaBricks April 4, 2011 4 / 30

Page 8: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

std::stable sort

/usr/include/c++/4.5.2/bits/stl algo.h lines 3350-3367

Jason Ansel (MIT) PetaBricks April 4, 2011 4 / 30

Page 9: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

std::sort

/usr/include/c++/4.5.2/bits/stl algo.h lines 2163-2167

Why 16? Why 15?

Dates back to at least 2000 (Jun 2000 SGI release)

Still in current C++ STL shipped with GCC

10+ years of of S threshold = 16

Jason Ansel (MIT) PetaBricks April 4, 2011 5 / 30

Page 10: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

std::sort

/usr/include/c++/4.5.2/bits/stl algo.h lines 2163-2167

Why 16? Why 15?

Dates back to at least 2000 (Jun 2000 SGI release)

Still in current C++ STL shipped with GCC

10+ years of of S threshold = 16

Jason Ansel (MIT) PetaBricks April 4, 2011 5 / 30

Page 11: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

std::sort

/usr/include/c++/4.5.2/bits/stl algo.h lines 2163-2167

Why 16? Why 15?

Dates back to at least 2000 (Jun 2000 SGI release)

Still in current C++ STL shipped with GCC

10+ years of of S threshold = 16

Jason Ansel (MIT) PetaBricks April 4, 2011 5 / 30

Page 12: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Is 15 the right number?

The best cutoff (CO) changes

Depends on competing costs:

Cost of computation (< operator, call overhead, etc)Cost of communication (swaps)Cache behavior (misses, prefetcher, locality)

Sorting 100000 doubles with std::stable sort:

CO ≈ 200 optimal on a Phenom 905e (15% speedup over CO = 15)CO ≈ 400 optimal on a Opteron 6168 (15% speedup over CO = 15)CO ≈ 500 optimal on a Xeon E5320 (34% speedup over CO = 15)CO ≈ 700 optimal on a Xeon X5460 (25% speedup over CO = 15)

Compiler’s hands are tied, it is stuck with 15

Jason Ansel (MIT) PetaBricks April 4, 2011 6 / 30

Page 13: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Is 15 the right number?

The best cutoff (CO) changes

Depends on competing costs:

Cost of computation (< operator, call overhead, etc)Cost of communication (swaps)Cache behavior (misses, prefetcher, locality)

Sorting 100000 doubles with std::stable sort:

CO ≈ 200 optimal on a Phenom 905e (15% speedup over CO = 15)CO ≈ 400 optimal on a Opteron 6168 (15% speedup over CO = 15)CO ≈ 500 optimal on a Xeon E5320 (34% speedup over CO = 15)CO ≈ 700 optimal on a Xeon X5460 (25% speedup over CO = 15)

Compiler’s hands are tied, it is stuck with 15

Jason Ansel (MIT) PetaBricks April 4, 2011 6 / 30

Page 14: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Back to our motivating example

How would you write a fast sorting algorithm?

Insertion sortQuick sortMerge sortRadix sortBinary tree sort, Bitonic sort, Bubble sort, Bucket sort, Burstsort,Cocktail sort, Comb sort, Counting Sort, Distribution sort, Flashsort,Heapsort, Introsort, Library sort, Odd-even sort, Postman sort,Samplesort, Selection sort, Shell sort, Stooge sort, Strand sort,Timsort?

Poly-algorithms

Answer

It depends!

Jason Ansel (MIT) PetaBricks April 4, 2011 7 / 30

Page 15: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Back to our motivating example

How would you write a fast sorting algorithm?

Insertion sortQuick sortMerge sortRadix sortBinary tree sort, Bitonic sort, Bubble sort, Bucket sort, Burstsort,Cocktail sort, Comb sort, Counting Sort, Distribution sort, Flashsort,Heapsort, Introsort, Library sort, Odd-even sort, Postman sort,Samplesort, Selection sort, Shell sort, Stooge sort, Strand sort,Timsort?

Poly-algorithms

Answer

It depends!

Jason Ansel (MIT) PetaBricks April 4, 2011 7 / 30

Page 16: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Autotuned parallel sorting algorithms

On a Xeon E7340 (2× 4 cores)1 Insertion sort below 6002 Quick sort below 14203 2-way parallel merge sort

On a Sun Fire T200 Niagara (8 cores)1 16-way merge sort below 752 8-way merge sort below 14613 4-way merge sort below 24004 2-way parallel merge sort

235% slowdown running Niagara algorithm on the Xeon

8% slowdown running Xeon algorithm on the Niagara

Need a way to express these algorithmic choices to enable autotuning

Jason Ansel (MIT) PetaBricks April 4, 2011 8 / 30

Page 17: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Autotuned parallel sorting algorithms

On a Xeon E7340 (2× 4 cores)1 Insertion sort below 6002 Quick sort below 14203 2-way parallel merge sort

On a Sun Fire T200 Niagara (8 cores)1 16-way merge sort below 752 8-way merge sort below 14613 4-way merge sort below 24004 2-way parallel merge sort

235% slowdown running Niagara algorithm on the Xeon

8% slowdown running Xeon algorithm on the Niagara

Need a way to express these algorithmic choices to enable autotuning

Jason Ansel (MIT) PetaBricks April 4, 2011 8 / 30

Page 18: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Autotuned parallel sorting algorithms

On a Xeon E7340 (2× 4 cores)1 Insertion sort below 6002 Quick sort below 14203 2-way parallel merge sort

On a Sun Fire T200 Niagara (8 cores)1 16-way merge sort below 752 8-way merge sort below 14613 4-way merge sort below 24004 2-way parallel merge sort

235% slowdown running Niagara algorithm on the Xeon

8% slowdown running Xeon algorithm on the Niagara

Need a way to express these algorithmic choices to enable autotuning

Jason Ansel (MIT) PetaBricks April 4, 2011 8 / 30

Page 19: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Autotuned parallel sorting algorithms

On a Xeon E7340 (2× 4 cores)1 Insertion sort below 6002 Quick sort below 14203 2-way parallel merge sort

On a Sun Fire T200 Niagara (8 cores)1 16-way merge sort below 752 8-way merge sort below 14613 4-way merge sort below 24004 2-way parallel merge sort

235% slowdown running Niagara algorithm on the Xeon

8% slowdown running Xeon algorithm on the Niagara

Need a way to express these algorithmic choices to enable autotuning

Jason Ansel (MIT) PetaBricks April 4, 2011 8 / 30

Page 20: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Outline

1 Motivating Example

2 PetaBricks Language Overview

3 Variable Accuracy

4 Autotuner

5 Results

6 Conclusions

Jason Ansel (MIT) PetaBricks April 4, 2011 9 / 30

Page 21: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Algorithmic choices

Language

e i the r {I n s e r t i o n S o r t ( out , i n ) ;

} or {Q u i c k S o r t ( out , i n ) ;

} or {MergeSort ( out , i n ) ;

} or {R a d i x S o r t ( out , i n ) ;

}

Representation

Decision tree synthesized byour evolutionary algorithm(EA)

Jason Ansel (MIT) PetaBricks April 4, 2011 10 / 30

Page 22: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Algorithmic choices

Language

e i the r {I n s e r t i o n S o r t ( out , i n ) ;

} or {Q u i c k S o r t ( out , i n ) ;

} or {MergeSort ( out , i n ) ;

} or {R a d i x S o r t ( out , i n ) ;

}

Representation

Decision tree synthesized byour evolutionary algorithm(EA)

Jason Ansel (MIT) PetaBricks April 4, 2011 10 / 30

Page 23: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

The PetaBricks language

Choices expressed in the language

High level algorithmic choicesDependency-based synthesized outer control flowParallelization strategy

Programs automatically adapt to their environment

Tuned using our bottom-up evaluation algorithmOffline autotuner or always-on online autotuner

Jason Ansel (MIT) PetaBricks April 4, 2011 11 / 30

Page 24: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Outline

1 Motivating Example

2 PetaBricks Language Overview

3 Variable Accuracy

4 Autotuner

5 Results

6 Conclusions

Jason Ansel (MIT) PetaBricks April 4, 2011 12 / 30

Page 25: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Variable accuracy algorithms

Many problems don’t have a single correct answer

Soft computing

Approximation algorithms for NP-hard problems

DSP algorithms

Different grid resolutionsData precisions

Iterative algorithms

Choosing convergence criteria

Jason Ansel (MIT) PetaBricks April 4, 2011 13 / 30

Page 26: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Variable accuracy algorithms

Many problems don’t have a single correct answer

Soft computing

Approximation algorithms for NP-hard problems

DSP algorithms

Different grid resolutionsData precisions

Iterative algorithms

Choosing convergence criteria

Jason Ansel (MIT) PetaBricks April 4, 2011 13 / 30

Page 27: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Variable accuracy example

Example

. . .f o r ( i n t i = 0 ; i < 1 0 0 ; ++i ) {

S O R I t e r a t i o n ( tmp ) ;}. . .

Competing objectives of performance and accuracy

Must maximize performance while meeting accuracy targets

Jason Ansel (MIT) PetaBricks April 4, 2011 14 / 30

Page 28: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Variable accuracy example

Example

. . .f o r ( i n t i = 0 ; i < 1 0 0 ; ++i ) {

S O R I t e r a t i o n ( tmp ) ;}. . .

Competing objectives of performance and accuracy

Must maximize performance while meeting accuracy targets

Jason Ansel (MIT) PetaBricks April 4, 2011 14 / 30

Page 29: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Accuracy metrics and for enough loops

Languageaccu racy met r i c MyRMSError

. . .fo r enough {

SORI t e r a t i on ( tmp ) ;}

Representation

Function from problem sizeto number of iterationssynthesized by our EA

Jason Ansel (MIT) PetaBricks April 4, 2011 15 / 30

Page 30: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Accuracy metrics and for enough loops

Languageaccu racy met r i c MyRMSError

. . .fo r enough {

SORI t e r a t i on ( tmp ) ;}

Representation

Function from problem sizeto number of iterationssynthesized by our EA

Jason Ansel (MIT) PetaBricks April 4, 2011 15 / 30

Page 31: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Accuracy metrics and for enough loops

Languageaccu racy met r i c MyRMSError

. . .fo r enough {

SORI t e r a t i on ( tmp ) ;}

Representation

Function from problem sizeto number of iterationssynthesized by our EA

Jason Ansel (MIT) PetaBricks April 4, 2011 15 / 30

Page 32: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Accuracy variables

Languageaccu racy met r i c MyRMSErrora c c u r a c y v a r i a b l e k

. . .f o r ( i n t i =0; i<k ; ++i ) {

SORI t e r a t i on ( tmp ) ;}

⇒Representation

Function from problem sizeto k synthesized by our EA

Jason Ansel (MIT) PetaBricks April 4, 2011 16 / 30

Page 33: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Accuracy variables

Languageaccu racy met r i c MyRMSErrora c c u r a c y v a r i a b l e k

. . .f o r ( i n t i =0; i<k ; ++i ) {

SORI t e r a t i on ( tmp ) ;}

⇒Representation

Function from problem sizeto k synthesized by our EA

Jason Ansel (MIT) PetaBricks April 4, 2011 16 / 30

Page 34: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Variable accuracy and algorithmic choices

Languageaccu racy met r i c MyRMSError. . .e i t h e r {

fo r enough {SORI t e r a t i on ( tmp ) ;

}} or {

Mu l t i g r i d ( tmp ) ;} or {

D i r e c t S o l v e ( tmp ) ;}

Jason Ansel (MIT) PetaBricks April 4, 2011 17 / 30

Page 35: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Outline

1 Motivating Example

2 PetaBricks Language Overview

3 Variable Accuracy

4 Autotuner

5 Results

6 Conclusions

Jason Ansel (MIT) PetaBricks April 4, 2011 18 / 30

Page 36: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Traditional evolution algorithm

Initial population ? ? ? ? Cost = 0

Generation 2 Cost =

Generation 3 Cost =

Generation 4 Cost =

Cost of autotuning front-loaded in initial (unfit) population

We could speed up tuning if we start with a faster initial population

Key insight

Smaller input sizes can be used to form better initial population

Jason Ansel (MIT) PetaBricks April 4, 2011 19 / 30

Page 37: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Traditional evolution algorithm

Initial population 72.7s ? ? ? Cost = 72.7

Generation 2 Cost =

Generation 3 Cost =

Generation 4 Cost =

Cost of autotuning front-loaded in initial (unfit) population

We could speed up tuning if we start with a faster initial population

Key insight

Smaller input sizes can be used to form better initial population

Jason Ansel (MIT) PetaBricks April 4, 2011 19 / 30

Page 38: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Traditional evolution algorithm

Initial population 72.7s 10.5s ? ? Cost = 83.2

Generation 2 Cost =

Generation 3 Cost =

Generation 4 Cost =

Cost of autotuning front-loaded in initial (unfit) population

We could speed up tuning if we start with a faster initial population

Key insight

Smaller input sizes can be used to form better initial population

Jason Ansel (MIT) PetaBricks April 4, 2011 19 / 30

Page 39: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Traditional evolution algorithm

Initial population 72.7s 10.5s 4.1s ? Cost = 87.3

Generation 2 Cost =

Generation 3 Cost =

Generation 4 Cost =

Cost of autotuning front-loaded in initial (unfit) population

We could speed up tuning if we start with a faster initial population

Key insight

Smaller input sizes can be used to form better initial population

Jason Ansel (MIT) PetaBricks April 4, 2011 19 / 30

Page 40: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Traditional evolution algorithm

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5

Generation 2 Cost =

Generation 3 Cost =

Generation 4 Cost =

Cost of autotuning front-loaded in initial (unfit) population

We could speed up tuning if we start with a faster initial population

Key insight

Smaller input sizes can be used to form better initial population

Jason Ansel (MIT) PetaBricks April 4, 2011 19 / 30

Page 41: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Traditional evolution algorithm

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5

Generation 2 ? ? ? ? Cost = 0

Generation 3 Cost =

Generation 4 Cost =

Cost of autotuning front-loaded in initial (unfit) population

We could speed up tuning if we start with a faster initial population

Key insight

Smaller input sizes can be used to form better initial population

Jason Ansel (MIT) PetaBricks April 4, 2011 19 / 30

Page 42: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Traditional evolution algorithm

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5

Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1

Generation 3 Cost =

Generation 4 Cost =

Cost of autotuning front-loaded in initial (unfit) population

We could speed up tuning if we start with a faster initial population

Key insight

Smaller input sizes can be used to form better initial population

Jason Ansel (MIT) PetaBricks April 4, 2011 19 / 30

Page 43: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Traditional evolution algorithm

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5

Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1

Generation 3 ? ? ? ? Cost = 0

Generation 4 Cost =

Cost of autotuning front-loaded in initial (unfit) population

We could speed up tuning if we start with a faster initial population

Key insight

Smaller input sizes can be used to form better initial population

Jason Ansel (MIT) PetaBricks April 4, 2011 19 / 30

Page 44: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Traditional evolution algorithm

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5

Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1

Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0

Generation 4 Cost =

Cost of autotuning front-loaded in initial (unfit) population

We could speed up tuning if we start with a faster initial population

Key insight

Smaller input sizes can be used to form better initial population

Jason Ansel (MIT) PetaBricks April 4, 2011 19 / 30

Page 45: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Traditional evolution algorithm

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5

Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1

Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0

Generation 4 ? ? ? ? Cost = 0

Cost of autotuning front-loaded in initial (unfit) population

We could speed up tuning if we start with a faster initial population

Key insight

Smaller input sizes can be used to form better initial population

Jason Ansel (MIT) PetaBricks April 4, 2011 19 / 30

Page 46: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Traditional evolution algorithm

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5

Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1

Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0

Generation 4 0.3s 0.1s 0.4s 2.4s Cost = 3.2

Cost of autotuning front-loaded in initial (unfit) population

We could speed up tuning if we start with a faster initial population

Key insight

Smaller input sizes can be used to form better initial population

Jason Ansel (MIT) PetaBricks April 4, 2011 19 / 30

Page 47: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Traditional evolution algorithm

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5

Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1

Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0

Generation 4 0.3s 0.1s 0.4s 2.4s Cost = 3.2

Cost of autotuning front-loaded in initial (unfit) population

We could speed up tuning if we start with a faster initial population

Key insight

Smaller input sizes can be used to form better initial population

Jason Ansel (MIT) PetaBricks April 4, 2011 19 / 30

Page 48: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Traditional evolution algorithm

Initial population 72.7s 10.5s 4.1s 31.2s Cost = 118.5

Generation 2 4.2s 5.1s 2.6s 13.2s Cost = 25.1

Generation 3 2.8s 0.1s 3.8s 2.3s Cost = 9.0

Generation 4 0.3s 0.1s 0.4s 2.4s Cost = 3.2

Cost of autotuning front-loaded in initial (unfit) population

We could speed up tuning if we start with a faster initial population

Key insight

Smaller input sizes can be used to form better initial population

Jason Ansel (MIT) PetaBricks April 4, 2011 19 / 30

Page 49: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Bottom-up evolutionary algorithm

Train on input size 1, to form initial population for:

Train on input size 2, to form initial population for:

Train on input size 8, to form initial population for:

Train on input size 16, to form initial population for:

Train on input size 32, to form initial population for:

Train on input size 64

Naturally exploits optimal substructure of problems

Jason Ansel (MIT) PetaBricks April 4, 2011 20 / 30

Page 50: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Bottom-up evolutionary algorithm

Train on input size 1, to form initial population for:

Train on input size 2, to form initial population for:

Train on input size 8, to form initial population for:

Train on input size 16, to form initial population for:

Train on input size 32, to form initial population for:

Train on input size 64

Naturally exploits optimal substructure of problems

Jason Ansel (MIT) PetaBricks April 4, 2011 20 / 30

Page 51: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Bottom-up evolutionary algorithm

Train on input size 1, to form initial population for:

Train on input size 2, to form initial population for:

Train on input size 8, to form initial population for:

Train on input size 16, to form initial population for:

Train on input size 32, to form initial population for:

Train on input size 64

Naturally exploits optimal substructure of problems

Jason Ansel (MIT) PetaBricks April 4, 2011 20 / 30

Page 52: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Bottom-up evolutionary algorithm

Train on input size 1, to form initial population for:

Train on input size 2, to form initial population for:

Train on input size 8, to form initial population for:

Train on input size 16, to form initial population for:

Train on input size 32, to form initial population for:

Train on input size 64

Naturally exploits optimal substructure of problems

Jason Ansel (MIT) PetaBricks April 4, 2011 20 / 30

Page 53: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Bottom-up evolutionary algorithm

Train on input size 1, to form initial population for:

Train on input size 2, to form initial population for:

Train on input size 8, to form initial population for:

Train on input size 16, to form initial population for:

Train on input size 32, to form initial population for:

Train on input size 64

Naturally exploits optimal substructure of problems

Jason Ansel (MIT) PetaBricks April 4, 2011 20 / 30

Page 54: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Bottom-up evolutionary algorithm

Train on input size 1, to form initial population for:

Train on input size 2, to form initial population for:

Train on input size 8, to form initial population for:

Train on input size 16, to form initial population for:

Train on input size 32, to form initial population for:

Train on input size 64

Naturally exploits optimal substructure of problems

Jason Ansel (MIT) PetaBricks April 4, 2011 20 / 30

Page 55: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Bottom-up evolutionary algorithm

Train on input size 1, to form initial population for:

Train on input size 2, to form initial population for:

Train on input size 8, to form initial population for:

Train on input size 16, to form initial population for:

Train on input size 32, to form initial population for:

Train on input size 64

Naturally exploits optimal substructure of problems

Jason Ansel (MIT) PetaBricks April 4, 2011 20 / 30

Page 56: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Variable accuracy autotuner

Generation i

⇒ Generation i + 1

Partition accuracy space into discrete levels

Prune population to have a fixed number of representatives from eachlevel

Jason Ansel (MIT) PetaBricks April 4, 2011 21 / 30

Page 57: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Variable accuracy autotuner

Generation i

⇒ Generation i + 1

Partition accuracy space into discrete levels

Prune population to have a fixed number of representatives from eachlevel

Jason Ansel (MIT) PetaBricks April 4, 2011 21 / 30

Page 58: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Variable accuracy autotuner

Generation i

⇒ Generation i + 1

Partition accuracy space into discrete levels

Prune population to have a fixed number of representatives from eachlevel

Jason Ansel (MIT) PetaBricks April 4, 2011 21 / 30

Page 59: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Variable accuracy autotuner

Generation i

⇒ Generation i + 1

Partition accuracy space into discrete levels

Prune population to have a fixed number of representatives from eachlevel

Jason Ansel (MIT) PetaBricks April 4, 2011 21 / 30

Page 60: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Variable accuracy autotuner

Generation i

⇒ Generation i + 1

Partition accuracy space into discrete levels

Prune population to have a fixed number of representatives from eachlevel

Jason Ansel (MIT) PetaBricks April 4, 2011 21 / 30

Page 61: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Variable accuracy autotuner

Generation i

Generation i + 1

Partition accuracy space into discrete levels

Prune population to have a fixed number of representatives from eachlevel

Jason Ansel (MIT) PetaBricks April 4, 2011 21 / 30

Page 62: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Variable accuracy autotuner

Generation i

⇒Generation i + 1

Partition accuracy space into discrete levels

Prune population to have a fixed number of representatives from eachlevel

Jason Ansel (MIT) PetaBricks April 4, 2011 21 / 30

Page 63: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Outline

1 Motivating Example

2 PetaBricks Language Overview

3 Variable Accuracy

4 Autotuner

5 Results

6 Conclusions

Jason Ansel (MIT) PetaBricks April 4, 2011 22 / 30

Page 64: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Changing accuracy requirements

1

2

4

8

16

32

10 100 1000 10000

Spe

edup

(x)

Input Size

Accuracy Level 2.0Accuracy Level 1.5Accuracy Level 1.0Accuracy Level 0.8Accuracy Level 0.6Accuracy Level 0.3

Image Compression

Jason Ansel (MIT) PetaBricks April 4, 2011 23 / 30

Page 65: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Changing accuracy requirements

1

2

4

8

10 100 1000

Spe

edup

(x)

Input Size

Accuracy Level 0.95Accuracy Level 0.75Accuracy Level 0.50Accuracy Level 0.20Accuracy Level 0.10Accuracy Level 0.05

Clustering

1

10

100

1000

10000

10 100 1000 10000 100000 1e+06

Spe

edup

(x)

Input Size

Accuracy Level 1.01Accuracy Level 1.1 Accuracy Level 1.2 Accuracy Level 1.3 Accuracy Level 1.4

Bin Packing

1

2

4

8

16

32

10 100 1000 10000

Spe

edup

(x)

Input Size

Accuracy Level 2.0Accuracy Level 1.5Accuracy Level 1.0Accuracy Level 0.8Accuracy Level 0.6Accuracy Level 0.3

Image Compression

1

2

4

8

16

32

10 100 1000 10000 100000

Spe

edup

(x)

Input Size

Accuracy Level 109

Accuracy Level 107

Accuracy Level 105

Accuracy Level 103

Accuracy Level 101

3D Helmholtz

1

2

4

8

16

32

64

100 1000 10000 100000 1e+06

Spe

edup

(x)

Input Size

Accuracy Level 109

Accuracy Level 107

Accuracy Level 105

Accuracy Level 103

Accuracy Level 101

2D Poisson

1

2

4

8

10 100 1000 10000

Spe

edup

(x)

Input Size

Accuracy Level 3.0Accuracy Level 2.0Accuracy Level 1.5Accuracy Level 1.0Accuracy Level 0.5Accuracy Level 0.0

Preconditioner

Jason Ansel (MIT) PetaBricks April 4, 2011 24 / 30

Page 66: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Multigrid choice space

Pseudo codeaccu racy met r i c MyRMSError. . .e i t h e r {

fo r enough {SORI t e r a t i on ( tmp ) ;

}} or {

Mu l t i g r i d ( tmp ) ;} or {

D i r e c t S o l v e ( tmp ) ;}

Jason Ansel (MIT) PetaBricks April 4, 2011 25 / 30

Page 67: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Multigrid choice space

Pseudo codeaccu racy met r i c MyRMSError. . .e i t h e r {

fo r enough {SORI t e r a t i on ( tmp ) ;

}} or {

Mu l t i g r i d ( tmp ) ;} or {

D i r e c t S o l v e ( tmp ) ;}

SOR Ite ra tion

Time

Jason Ansel (MIT) PetaBricks April 4, 2011 25 / 30

Page 68: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Multigrid choice space

Pseudo codeaccu racy met r i c MyRMSError. . .e i t h e r {

fo r enough {SORI t e r a t i on ( tmp ) ;

}} or {

Mu l t i g r i d ( tmp ) ;} or {

D i r e c t S o l v e ( tmp ) ;}

Grid

Siz

e128

SOR Iteration

Time

64

32

16

Jason Ansel (MIT) PetaBricks April 4, 2011 25 / 30

Page 69: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Multigrid choice space

Pseudo codeaccu racy met r i c MyRMSError. . .e i t h e r {

fo r enough {SORI t e r a t i on ( tmp ) ;

}} or {

Mu l t i g r i d ( tmp ) ;} or {

D i r e c t S o l v e ( tmp ) ;}

Grid

Siz

e128

SOR Iteration

Time

64

32

16

Jason Ansel (MIT) PetaBricks April 4, 2011 25 / 30

Page 70: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Multigrid choice space

Pseudo codeaccu racy met r i c MyRMSError. . .e i t h e r {

fo r enough {SORI t e r a t i on ( tmp ) ;

}} or {

Mu l t i g r i d ( tmp ) ;} or {

D i r e c t S o l v e ( tmp ) ;}

Grid

Siz

e128

SOR Iteration

Time

64

32

16

Direct Solve

Jason Ansel (MIT) PetaBricks April 4, 2011 25 / 30

Page 71: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Autotuned V-cycle shapes

101

Gri

d S

ize

2048

1024

512

256

128

64

32

16

Jason Ansel (MIT) PetaBricks April 4, 2011 26 / 30

Page 72: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Autotuned V-cycle shapes

101

Gri

d S

ize

2048

1024

512

256

128

64

32

16

103

Jason Ansel (MIT) PetaBricks April 4, 2011 26 / 30

Page 73: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Autotuned V-cycle shapes

101

Gri

d S

ize

2048

1024

512

256

128

64

32

16

103

105

Jason Ansel (MIT) PetaBricks April 4, 2011 26 / 30

Page 74: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Autotuned V-cycle shapes

101

Gri

d S

ize

2048

1024

512

256

128

64

32

16

103

105

107

Gri

d S

ize

2048

1024

512

256

128

64

32

16

Jason Ansel (MIT) PetaBricks April 4, 2011 26 / 30

Page 75: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Autotuned bin packing algorithms

Jason Ansel (MIT) PetaBricks April 4, 2011 27 / 30

Page 76: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Outline

1 Motivating Example

2 PetaBricks Language Overview

3 Variable Accuracy

4 Autotuner

5 Results

6 Conclusions

Jason Ansel (MIT) PetaBricks April 4, 2011 28 / 30

Page 77: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Conclusions

Motivating goal of PetaBricks

Make programs future-proof by allowing them to adapt to theirenvironment.

We can do better than hard coded constants!

Jason Ansel (MIT) PetaBricks April 4, 2011 29 / 30

Page 78: Language and Compiler Support for Auto-Tuning Variable …people.csail.mit.edu/jansel/papers/slides-ansel-cgo2011.pdf · Timsort? Poly-algorithms Answer It depends! Jason Ansel (MIT)

Thanks!

Questions?

http://projects.csail.mit.edu/petabricks/

Jason Ansel (MIT) PetaBricks April 4, 2011 30 / 30