The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea...

The Hardness of Cache Conscious Data Placement

Erez Petrank, TechnionDror Rawitz, Caesarea Rothschild Institute

Appeared in29th ACM Conference on Principles of Programming

Languages Portland, Oregon, January 16, 2002

Agenda Background & motivation The problem of cache conscious

data / code placement is: extremely difficult in various models

Positive matching results (weak…). Some proof techniques and details Conclusion

Cache Structure Large memory divided

into blocks. Small cache - k blocks. Mapping of memory

blocks to cache blocks. (e.g., modulus function)

Cache hit - accessed block is in the cache.

Cache miss - required block must be read to cache from memory.

Direct mapping

What can we do to improve program cache behavior?

Arrange code / data to minimize cache misses

Write cache-conscious programs

In this work we concentrate on the first.

How do we place data (or code) optimally?

Step 1: Discover future accesses to data. Step 2: Find placement of data that

minimizes the number of cache misses.

Step 3: Rearranged the data in memory. Step 4: Run program.

Some “minor” problems: In Step 1: We cannot tell the future In Step 2: We don’t know how to do that

Step 1: Discover future accesses to data Static analysis. Profiling. Runtime monitoring.

This work:Even if future accesses are known exactly,Step 2 (placing data optimally) is extremely

difficult.

The Problem Input: a set of objects O={o1,…,om}, and a

sequence of accesses =(1,…,n).

E.g. = (o1,o3,o7,o1,o2,o1,o3,o4,o1).

Solution: a placement, f:ON. Measure: number of misses.

We want: placement of o1,…,om in memory that obtains minimum number of cache misses (over all possible placements).

Our Results

Can we (efficiently) find an optimal placement?

No! Unless, P=NP.

Our Results

Can we (efficiently) find an “almost” optimal placement? Almost = # misses twice the optimum

No! Unless, P=NP.

Can we (eff.) find “fairly” optimal placement? Fairly = # misses 100 times the optimum No! Unless, P=NP.

Our Results

Can we (eff.) find a “reasonable” placement? reasonable = # misses log(n) the optimum No! Unless, P=NP.

Can we (eff.) find an “acceptable” placement? Acceptable = # misses n0.99 times the optimum No! Unless, P=NP.

The Main Theorem

Let ε be any real number, 0<ε<1. If there is a polynomial time algorithm that finds a placement which is within a factor of n(1-) from the optimum, then P=NP.

(Theorem holds for caches with > 2 blocks)

Extend to t-way Associative Caches

t-way Associative Caches:

t·k blocks in cache, k sets, t blocks in a set.

memory block mapped to a set.

Inside a set: a replacement protocol.

Theorem 2: same hardness holds for t-way associative cache systems.

Result is “robust” Holds for a variety of models. E.g.,

Mapping of memory block to cache is not by modulus,

Replacement policy is not standard, Object sizes are fixed, (or they are

not), Objects must be aligned to cache

blocks, (or not), Etc…

More Difficulties:Pairwise Information

A practical problem: sequence of accesses is long. Processing it is costly.

Solution in previous work: keep relations between pairs of objects. E.g., for each pair how beneficial is putting

them in the same memory block. E.g., for each pair how many misses would

be caused by mapping the two to the same cache block

Pairwise Information is Lossy

Conclusion:Even when given unrestricted time, finding an optimal pairwise placement is a bad idea (worst case).

Theorem 3:There exists a sequence such that

#misses(f) (k-3) #misses(f*)f - optimal pairwise placemetf* - optimal placement

Pairwise Information:Hardness Result

Theorem 4: Let ε be any real number, 0<ε<1. If there is a polynomial time algorithm that finds a placement which is within a factor of n(1- ε) from the optimum with respect to pairwise information, then P=NP.

Proof is similar to the direct mapping case.

A Simple Observation Input:

Objects O={o1,…,om}, and access sequence =(1,…,n).

Any placement yields at most n cache misses.

Any placement yields at least 1 cache miss. Therefore, any placement is within a factor of

n from the optimum. (Recall: a solution within n(1-ε) is not

possible.)

What about positive results ?

In light of the lower bound not much can be done in general. Yet…

Theorem 5: There exists a polynomial time approximation algorithm that outputs a placement (always) within a factor of from the optimal placement for any c. Compare: impossible: n(1-), possible: n/c

The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea...

Documents

Transcript of The Hardness of Cache Conscious Data Placement Erez Petrank, Technion Dror Rawitz, Caesarea...

Dror Yeger Marketing # 10

Theory of Compilation 236360 Erez Petrank

Dror Mphil Thesis

Dror Yeger Marketing #1

MALCAH YAEGER-DROR et

CURRICULUM VITAE DROR PALEY, MD, FRCSCpaleyinstitute.org/wp-content/uploads/DROR_PALEY_MD... · 2018. 8. 8. · 1 CURRICULUM VITAE DROR PALEY, MD, FRCSC Dror Paley, MD, FRCSC Paley

Determinan Caesarea

Litmus Paper - Dror Israel

Dror Tamir, CEO Dror@Hargol.com @HargolFoodTech · Dror Tamir, CEO Dror@Hargol.com. 10 8 6 4 2 1800 1900 2000 2050 9 Billion Global population by 2050 DOUBLE Protein demand to. Plant

CURRICULUM VITAE DROR PALEY, MD, FRCSClengthening.us/Dror_Paley_CV_Updated_5_21_09website.pdf · 2009-05-26 · 1 CURRICULUM VITAE DROR PALEY, MD, FRCSC Dror Paley, MD, FRCSC Paley

Urban Renewal_irit dror architects

Nazareth, the Caesarea Inscription, and the hand of God the Caesarea Inscription, and the hand of God.pdf1 Nazareth, the Caesarea Inscription, and the hand of God by Enrico Tuccinardi

Caesarea Power Point

Theory of Compilation 236360 Erez Petrank

©2003 Dror Feitelson Parallel Computing Systems Part III: Job Scheduling Dror Feitelson Hebrew University.

Theory of Compilation 236360 Erez Petrank Lecture 11: Optimizations 1.

Caesarea Philippi - Sanctuary Stringssanctuarystrings.com/downloads/cvha/My_Saviours_Love_Complete.pdf · Caesarea Philippi . Violin Solo . COVER: My Saviour’s Love. The area of

SECTIO CAESAREA AT CPD

A History of Dror

Theory of Compilation 236360 Erez Petrank Lecture 8: Runtime. 1.