AE ii lCh t i ti fAn Empirical Characterization of Stream...

41
A E ii l Ch t i ti f An Empirical Characterization of Stream Programs and its Implications for Language and Compiler Design Bill Thies 1 and Saman Amarasinghe 2 1 Mi fR h I di 1 Microsoft Research India 2 Massachusetts Institute of Technology PACT 2010

Transcript of AE ii lCh t i ti fAn Empirical Characterization of Stream...

Page 1: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

A E i i l Ch t i ti fAn Empirical Characterization of Stream Programs and its Implications g p

for Language and Compiler Design

Bill Thies1 and Saman Amarasinghe2

1 Mi f R h I di1 Microsoft Research India2 Massachusetts Institute of Technologygy

PACT 2010

Page 2: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

What Does it Take toEvaluate a New Language?Evaluate a New Language?

StreamIt (PACT'10)

AG (LDTA'06)

Contessa (FPT'07)

( )

RASCAL (SCAM'09)

AG (LDTA 06)

Anne (PLDI'10)

NDL (LCTES'04)

Teapot (PLDI'96)

UR (PLDI'10)

Facile (PLDI'01)

Teapot (PLDI 96)

10000 20000 1000 2000

10000 2000Lines of Code

Page 3: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

What Does it Take toEvaluate a New Language?Evaluate a New Language?

StreamIt (PACT'10)

AG (LDTA'06)

Contessa (FPT'07)

( )Small studies make it hard to assess:- Experiences of new users over time

RASCAL (SCAM'09)

AG (LDTA 06) Experiences of new users over time- Common patterns across large programs

Anne (PLDI'10)

NDL (LCTES'04)

Teapot (PLDI'96)

UR (PLDI'10)

Facile (PLDI'01)

Teapot (PLDI 96)

10000 20000 1000 2000

10000 2000Lines of Code

Page 4: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

What Does it Take toEvaluate a New Language?Evaluate a New Language?

StreamIt (PACT'10)StreamIt (PACT’10)

AG (LDTA'06)

Contessa (FPT'07)

( )( )

RASCAL (SCAM'09)

AG (LDTA 06)

Anne (PLDI'10)

NDL (LCTES'04)

Teapot (PLDI'96)

UR (PLDI'10)

0 2000 4000 6000 800010000120001400016000180002000022000240002600028000300003200034000

Facile (PLDI'01)

Teapot (PLDI 96)

10K0 20K 30KLines of Code

0 2000 4000 6000 80001000012000140001600018000200002200024000260002800030000320003400010K0 20K 30K

Page 5: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

What Does it Take toEvaluate a New Language?Evaluate a New Language?

StreamIt (PACT'10)StreamIt (PACT’10)

AG (LDTA'06)

Contessa (FPT'07)

( )( )

RASCAL (SCAM'09)

AG (LDTA 06)

Anne (PLDI'10)

NDL (LCTES'04)

Teapot (PLDI'96)

UR (PLDI'10)

0 2000 4000 6000 800010000120001400016000180002000022000240002600028000300003200034000

Facile (PLDI'01)

Teapot (PLDI 96)

10K0 20K 30KLines of Code

0 2000 4000 6000 80001000012000140001600018000200002200024000260002800030000320003400010K0 20K 30K

Page 6: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

What Does it Take toEvaluate a New Language?Evaluate a New Language?

StreamIt (PACT'10)StreamIt (PACT’10)

AG (LDTA'06)

Contessa (FPT'07)

( )( )

Our characterization:- 65 programs

RASCAL (SCAM'09)

AG (LDTA 06) 65 programs- 34,000 lines of code- Written by 22 students

O i d f 8

Anne (PLDI'10)

NDL (LCTES'04) - Over period of 8 years

This allows:

Teapot (PLDI'96)

UR (PLDI'10) - Non-trivial benchmarks- Broad picture of application space

Understanding long term user

0 2000 4000 6000 800010000120001400016000180002000022000240002600028000300003200034000

Facile (PLDI'01)

Teapot (PLDI 96)

10K0 20K 30K

- Understanding long-term userexperience

Lines of Code0 2000 4000 6000 80001000012000140001600018000200002200024000260002800030000320003400010K0 20K 30K

Page 7: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

Streaming Application Domain• For programs based on streams of data

Audio video DSP networking and

AtoD

– Audio, video, DSP, networking, and cryptographic processing kernels

– Examples: HDTV editing, radar

FMDemod

p g,tracking, microphone arrays, cell phone base stations, graphics

LPF

Duplicate

LPF LPF• Properties of stream programs

– Regular and repeating computation

LPF1 LPF2 LPF3

HPF1 HPF2 HPF3– Independent filters

with explicit communicationRoundRobin

HPF1 HPF2 HPF3

Adder

RoundRobin

Speaker

Page 8: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

StreamIt: A Language and Compilerfor Stream Programsfor Stream Programs

• Key idea: design language that enables static analysis• Key idea: design language that enables static analysis

• Goals:1. Improve programmer productivity in the streaming domain2. Expose and exploit the parallelism in stream programs

• Project contributions:– Language design for streaming [CC'02, CAN'02, PPoPP'05, IJPP'05]

– Automatic parallelization [ASPLOS'02, G.Hardware'05, ASPLOS'06, MIT’10]

– Domain-specific optimizations [PLDI'03, CASES'05, MM'08]

– Cache-aware scheduling [LCTES'03, LCTES'05]

– Extracting streams from legacy code [MICRO'07]

– User + application studies [PLDI'05, P-PHEC'05, IPDPS'06]

Page 9: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

StreamIt Language Basics• High-level, architecture-independent language

Backend support for uniprocessors multicores (Raw SMP)– Backend support for uniprocessors, multicores (Raw, SMP), cluster of workstations

• Model of computation: synchronous dataflow[Lee & Messerschmidt,1987]• Model of computation: synchronous dataflow

– Program is a graph of independent filtersFilters have an atomic execution step

Input1

x 10

1987]

– Filters have an atomic execution stepwith known input / output rates

– Compiler is responsible for Decimate

110

x 1p pscheduling and buffer management

• Extensions to synchronous dataflow O tp t

11

x 1Extensions to synchronous dataflow – Dynamic I/O rates– Support for sliding window operations

Output x 1

Support for sliding window operations– Teleport messaging [PPoPP’05]

Page 10: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

Example Filter: Low Pass Filterfloat->float filter LowPassFilter (int N, float[N] weights;) {

work peek N push 1 pop 1 {float result = 0;

for (int i=0; i<weights.length; i++) {result += weights[i] * peek(i);

N

Statefulg p ( )

}push(result);pop();

filterStatelessp p();

}}

Page 11: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

Example Filter: Low Pass Filterfloat->float filter LowPassFilter (int N

float[N] weights;) {

work peek N push 1 pop 1 {float result = 0;

float[N] weights;

h d h l()for (int i=0; i<weights.length; i++) {

result += weights[i] * peek(i);

Nweights = adaptChannel();

Statefulg p ( )

}push(result);pop();

filterp p();

}}

Page 12: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

Structured Streams

i li

filter • Each structure is single-input single-output

may be any StreamIt language

pipeline input, single-output

• Hierarchical and composableconstruct

splitjoincomposable

joinersplitter

feedback loopp

joiner splitter

Page 13: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

StreamIt Benchmark Suite (1/2)• Realistic applications (30):

MPEG2 encoder / decoder – Serpent encryption– MPEG2 encoder / decoder– Ground Moving Target Indicator– Mosaic

– Serpent encryption– Vocoder– RayTracerMosaic

– MP3 subset– Medium Pulse Compression Radar

RayTracer– 3GPP physical layer– Radar Array Front EndMedium Pulse Compression Radar

– JPEG decoder / transcoder– Feature Aided Tracking

Radar Array Front End– Freq-hopping radio– Orthogonal Frequency g

– HDTV– H264 subset

g q yDivision Multiplexer

– Channel Vocoder

– Synthetic Aperture Radar– GSM Decoder

– Filterbank– Target Detector

– 802.11a transmitte– DES encryption

– FM Radio– DToA Converter

Page 14: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

StreamIt Benchmark Suite (2/2)• Libraries / kernels (23):

– Autocorrelation – Matrix MultiplicationAutocorrelation– Cholesky– CRC

Matrix Multiplication– Oversampler– Rate Convert

– DCT (1D / 2D, float / int)– FFT (4 granularities)

– Time Delay Equalization– Trellis

– Lattice

• Graphics pipelines (4):– VectAdd

p p p ( )– Reference pipeline– Phong shading

– Shadow volumes– Particle system

• Sorting routines (8)– Bitonic sort (3 versions) – Insertion sortto c so t (3 e s o s)– Bubble Sort– Comparison counting

– Merge sort– Radix sort

Page 15: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

3GPP

Page 16: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

802.11a

Page 17: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

Bitonic Sort

Page 18: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

Note to online viewers:fFor high-resolution stream graphs of all benchmarks,

please see pp. 173-240 of this thesis:http://groups csail mit edu/commit/papers/09/thies-phd-thesis pdfhttp://groups.csail.mit.edu/commit/papers/09/thies phd thesis.pdf

Page 19: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

Characterization Overview• Focus on architecture-independent features

Avoid performance artifacts of the StreamIt compiler– Avoid performance artifacts of the StreamIt compiler– Estimate execution time statically (not perfect)

Th t i f i i• Three categories of inquiry:1. Throughput bottlenecks2 S h d li h t i ti2. Scheduling characteristics3. Utilization of StreamIt language features

Page 20: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

Lessons Learned fromLessons Learned fromthe StreamIt Languageg g

What we did rightWhat we did wrong

Opportunities for doing better

Page 21: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

1. Expose Task, Data, & Pipeline Parallelism

Data parallelism• Analogous to DOALL loops

Splitter

Task parallelism

JoinerPipeline parallelism

Task

Page 22: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

1. Expose Task, Data, & Pipeline Parallelism

Data parallelismSplitter

Stateless

Splitter

Joiner

ne

Task parallelism

Pip

eli

Joiner

Pipeline parallelismData

Task

Page 23: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

1. Expose Task, Data, & Pipeline Parallelism

Data parallelism• 74% of benchmarks contain entirely

data-parallel filters

Splitter

• In other benchmarks, 5% to 96% (median 71%) of work is data-parallelSplitter

Joiner

ne

Task parallelism• 82% of benchmarks containP

ipel

i

at least one splitjoin

• Median of 8 splitjoins per benchmarkJoiner

Pipeline parallelismData

Task

Page 24: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

Characterizing Stateful Filters763 Filter Types 49 Stateful Types

94% Stateless 55%

A id bl45%

Al ith i6%

Stateful

AvoidableState

AlgorithmicState

Stateful

Sources of Algorithmic State– MPEG2: bit-alignment, reference frame encoding, motion prediction, …– HDTV: Pre-coding and Ungerboeck encodingg g g– HDTV + Trellis: Ungerboeck decoding– GSM: Feedback loops– Vocoder: Accumulator adaptive filter feedback loop– Vocoder: Accumulator, adaptive filter, feedback loop– OFDM: Incremental phase correction– Graphics pipelines: persistent screen buffers

Page 25: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

Characterizing Stateful Filters27 Types with

763 Filter Types 49 Stateful Types27 Types with

“Avoidable State”

94% Stateless 55%

A id bl45%

Al ith iDue to

induction6%

Stateful

AvoidableState

AlgorithmicState

inductionvariables

Stateful

Sources of Algorithmic State– MPEG2: bit-alignment, reference frame encoding, motion prediction, …– HDTV: Pre-coding and Ungerboeck encodingg g g– HDTV + Trellis: Ungerboeck decoding– GSM: Feedback loops– Vocoder: Accumulator adaptive filter feedback loop– Vocoder: Accumulator, adaptive filter, feedback loop– OFDM: Incremental phase correction– Graphics pipelines: persistent screen buffers

Page 26: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

Characterizing Stateful Filters2. Eliminate Stateful Induction Variables27 Types with

763 Filter Types 49 Stateful Types27 Types with

“Avoidable State”

94% Stateless 55%

A id bl45%

Al ith iDue to

induction6%

Stateful

AvoidableState

AlgorithmicState

inductionvariables

Stateful

Sources of Induction Variables– MPEG encoder: counts frame # to assign picture type– MPD / Radar: count position in logical vector for FIR– MPD / Radar: count position in logical vector for FIR– Trellis: noise source flips every N items– MPEG encoder / MPD: maintain logical 2D position (row/column)– MPD: reset accumulator when counter overflows

Opportunity: Language primitive to return current iteration?

Page 27: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

Characterizing Stateful Filters2. Eliminate Stateful Induction Variables27 Types with

763 Filter Types 49 Stateful Types27 Types with

“Avoidable State”

D t94% Stateless 55%

A id bl45%

Al ith iDue to

induction

Due toGranularity

6% Stateful

AvoidableState

AlgorithmicState Due to

message

inductionvariables

Statefulhandlers

Sources of Induction Variables– MPEG encoder: counts frame # to assign picture type– MPD / Radar: count position in logical vector for FIR– MPD / Radar: count position in logical vector for FIR– Trellis: noise source flips every N items– MPEG encoder / MPD: maintain logical 2D position (row/column)– MPD: reset accumulator when counter overflows

Opportunity: Language primitive to return current iteration?

Page 28: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

3. Expose Parallelism in Sliding Windows

0 1 2 3 4 5 6 7 8 9 10 11 input

FIR

• Legacy codes obscure parallelism in sliding windows

output0 1

g y p g– In von-Neumann languages, modulo functions or copy/shift

operations prevent detection of parallelism in sliding windows

• Sliding windows are prevalent in our benchmark suite– 57% of realistic applications contain at least one sliding windowpp g– Programs with sliding windows have 10 instances on average– Without this parallelism, 11 of our benchmarks would have a p

new throughput bottleneck (work: 3% - 98%, median 8%)

Page 29: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

Characterizing Sliding Windows

44%

34 SlidingWindow Types29% 44%

FIR Filterspush 1

One-item windows push 1

pop 1peek N

pop Npeek N+1

3GPP, OFDM, Filterbank,

TargetDetect, DToA,

Mosaic, HDTV, FMRadio,JPEG decode / transcode,

Vocoderg , ,Oversampler,

RateConvert, Vocoder, ChannelVocoder,

FMRadio

27%Miscellaneous

FMRadioMP3: reordering (peek >1000)802.11: error codes (peek 3-7)Vocoder / A.beam: skip data

Channel Vocoder:sliding correlation (peek 100)

Page 30: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

4. Expose Startup Behaviors• Example: difference encoder (JPEG, Vocoder)

int >int filter Diff Encoder() { int >int filter Diff Encoder() {int->int filter Diff_Encoder() {int state = 0;

work push 1 pop 1 {

int->int filter Diff_Encoder() {

prework push 1 pop 1 {push(peek(0));work push 1 pop 1 {

push(peek(0) – state);state = pop();

}

push(peek(0));}

work push 1 pop 1 peek 2 {}}

p p p p {push(peek(1) – peek(0));pop();

}

Stateful

• Required by 15 programs:– For delay: MPD, HDTV, Vocoder, 3GPP, Filterbank,

Stateless}

}

For delay: MPD, HDTV, Vocoder, 3GPP, Filterbank,DToA, Lattice, Trellis, GSM, CRC

– For picture reordering (MPEG)– For initialization (MPD, HDTV, 802.11)– For difference encoder or decoder: JPEG, Vocoder

Page 31: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

5. Surprise:Mis Matched Data Rates UncommonMis-Matched Data Rates Uncommon

1 2 3 2 7 8 7 5

x 147 x 98 x 28 x 32

CD-DATbenchmark

multiplicitiesx 147 x 98 x 28 x 32 p

Converts CD audio (44.1 kHz) to digital audio tape (48 kHz)

• This is a driving application in many papers– Eg: [MBL94] [TZB99] [BB00] [BML95] [CBL01] [MB04] [KSB08]– Due to large filter multiplicities, clever scheduling is

needed to control code size, buffer size, and latency

• But are mis-matched rates common in practice? No!

Page 32: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

5. Surprise:Mis Matched Data Rates UncommonMis-Matched Data Rates Uncommon

Excerpt from Execute onceJPEG transcoder

Execute onceper steady state

Page 33: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

Characterizing Mis-Matched Data Rates• In our benchmark suite:

89% of programs have a filter with a multiplicity of 1– 89% of programs have a filter with a multiplicity of 1– On average, 63% of filters share the same multiplicity– For 68% of benchmarks the most common multiplicity is 1For 68% of benchmarks, the most common multiplicity is 1

• Implication for compiler design:Do not expect advanced buffering strategies toDo not expect advanced buffering strategies to have a large impact on average programs– Example: Karczmarek Thies & Amarasinghe LCTES’03Example: Karczmarek, Thies, & Amarasinghe, LCTES 03– Space saved on CD-DAT: 14x– Space saved on other programs (median): 1.2xSpace saved on other programs (median): 1.2x

Page 34: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

6. Surprise: Multi-Phase FiltersCause More Harm than Good

• A multi-phase filter divides its execution into many steps

Cause More Harm than GoodA multi phase filter divides its execution into many steps– Formally known a cyclo-static dataflow– Possible benefits:

1 2

FF• Shorter latencies• More natural code

1 3

Step 1 FF Step 2

• We implemented multi-phase filters, and we regretted it– Programmers did not understand the difference betweenProgrammers did not understand the difference between

a phase of execution, and a normal function call– Compiler was complicated by presences of phases

• However, phases proved important for splitters / joiners– Routing items needs to be done with minimal latencyRouting items needs to be done with minimal latency– Otherwise buffers grow large, and deadlock in one case (GSM)

Page 35: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

7. Programmers IntroduceUnnecessary State in FiltersUnnecessary State in Filters

• Programmers do not implement things how you expect• Programmers do not implement things how you expect

void->int filter SquareWave() {int x = 0;

void->int filter SquareWave() {k h 2 { int x = 0;

work push 1 {

work push 2 {push(0);push(1);

push(x);x = 1 - x;

}

push(1);}

} Stateless }} Stateful

• Opportunity: add a “stateful” modifier to filter decl?– Require programmer to be cognizant of the cost of state

Page 36: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

8. Leverage and Improve Upon Structured StreamsStructured Streams

• Overall programmers found itOverall, programmers found it useful and tractable to write programs using structured streamsp g g– Syntax is simple to write, easy to read

• However, structured streams areoccasionally unnaturaly– And, in rare cases, insufficient

Page 37: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

8. Leverage and Improve Upon Structured StreamsStructured Streams

Original: Structured:Original: Structured:

Compiler recovers unstructured graphusing synchronization removal [Gordon 2010]

Page 38: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

8. Leverage and Improve Upon Structured StreamsStructured Streams

Original: Structured:Original: Structured:

Ch t i ti• Characterization:– 49% of benchmarks have an Identity node

In those benchmarks Identities account– In those benchmarks, Identities accountfor 3% to 86% (median 20%) of instances

O t it• Opportunity:– Bypass capability (ala GOTO) for streams

Page 39: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

Related Work• Benchmark suites in von-Neumann languages often

include stream programs, but lose high-level propertiesp g , g p p– MediaBench– ALPBench

– HandBench– MiBench

– SPEC– PARSEC

– Berkeley MM Workload

• Brook language includes 17K LOC benchmark suite

– NetBench – Perfect Club

• Brook language includes 17K LOC benchmark suite– Brook disallows stateful filters; hence, more data parallelism– Also more focus on dynamic rates & flexible program behaviorAlso more focus on dynamic rates & flexible program behavior

• Other stream languages lack benchmark characterizationSt C / K lC S idl– StreamC / KernelC

– Cg– Baker– SPUR

– Spidle

• In-depth analysis of 12 StreamIt “core” benchmarks published concurrently to this paper [Gordon 2010]

Page 40: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

Conclusions• First characterization of a streaming benchmark suite

that was written in a stream programming languagethat was written in a stream programming language– 65 programs; 22 programmers; 34 KLOC

Implications for streaming languages and compilers:• Implications for streaming languages and compilers:– DO: expose task, data, and pipeline parallelism

DO: expose parallelism in sliding windows– DO: expose parallelism in sliding windows– DO: expose startup behaviors

DO NOT: optimize for unusual case of mis matched I/O rates– DO NOT: optimize for unusual case of mis-matched I/O rates– DO NOT: bother with multi-phase filters– TRY: to prevent users from introducing unnecessary stateTRY: to prevent users from introducing unnecessary state– TRY: to leverage and improve upon structured streams– TRY: to prevent induction variables from serializing filtersTRY: to prevent induction variables from serializing filters

• Exercise care in generalizing results beyond StreamIt

Page 41: AE ii lCh t i ti fAn Empirical Characterization of Stream ...groups.csail.mit.edu/commit/papers/2010/thies-pact10-slides.pdf · AE ii lCh t i ti fAn Empirical Characterization of

Acknowledgments:Authors of the StreamIt BenchmarksAuthors of the StreamIt Benchmarks

• Sitij Agrawal • Ali Melij g• Basier Aziz• Jiawen Chen

• Mani Narayanan• Satish Ramaswamy• Jiawen Chen

• Matthew DrakeShi l F

• Satish Ramaswamy• Rodric Rabbah

J i S li• Shirley Fung• Michael Gordon

• Janis Sermulins• Magnus Stenemo

• Ola Johnsson• Andrew Lamb

• Jinwoo Suh• Zain ul-Abdin

• Chris Leger• Michal Karczmarek

• Amy Williams• Jeremy WongMichal Karczmarek

• David MazeJeremy Wong