Issue With Memory

issue with memory & cache in openmp?

MEMORY ISSUES:

i. Bandwidthii. Working in the cache

iii. Memory contention

(i) BANDWIDTH:

To conserve bandwidth, pack data more tightly, or move it lessfrequently between cores. Packing the data tighter is usuallystraightforward, and benefits sequential execution as well. For example, When declaring structures in C/C++, declare fields in order ofdescending size. This strategy tends to minimize the extra padding that the compiler must insert to maintain

alignment requirements, as exemplified.

Some compilers also support “#pragma pack”directives that pack structures even more tightly, possibly by removing all padding.

Such verytight packing may be counterproductive, however, because it causes misaligned loads and stores that may be significantly slower than aligned loads and stores.

(ii)WORKING IN THE CACHE:

Moving data less frequently is a more subtle exercise than packing, because mainstream programming languages do not have explicit commands to move data between a core and memory.

Data movement arises from the way the cores read and write memory. There are two categories of interactions to consider: those between cores and memory, and those

between cores. Data movement between a core and memory also occurs in singlecore processors, so minimizing

data movement benefits sequential programs as well. There exist numerous techniques. For example, a technique called ca che-obliviou s blocking

recursively divides a problem into smaller and smaller subproblems. Eventually the subproblemsbecome so small that they each fit in cache.

Another technique for reducing the cache footprint is to reorder steps in the code. Sometimes this is as simple as interchanging loops. Other times it requires more significant restructuring.

The Sieve of Eratosthenes is an elementary programming exercise that demonstrates such restructuring and its benefits.

Cache-Friendly Sieve of Eratostheneslong count = 0;long m = (long)sqrt((double)n);bool* composite = new bool[n+1];memset( composite, 0, n );long* factor = new long[m];long* striker = new long[m];long n_factor = 0;for( long i=2; i<=m; ++i )if( !composite[i] ) {++count;

striker[n_factor] = Strike( composite, 2*i, i, m );factor[n_factor++] = i;}// Chops sieve into windows of size sqrt(n)for( long window=m+1; window<=n; window+=m ) {long limit = min(window+m-1,n);for( long k=0; k<n_factor; ++k )// Strike walks window of size sqrt(n) here.striker[k] = Strike( composite, striker[k], factor[k],limit );for( long i=window; i<=limit; ++i )if( !composite[i] )++count;} delete[] striker;delete[] factor;delete[] composite;return count;}

The restructuring introduces extra complexity and bookkeeping operations.

(iii)MEMORY CONTENSION: For multi-core programs, working within the cache becomes trickier, because data is not only

transferred between a core and memory, but also between cores. As with transfers to and from memory, mainstream programming languages do not make these

transfers explicit. The transfers arise implicitly from patterns of reads and writes by different cores. The patterns

correspond to two types of data dependencies:(a) Read-write dependency. A core writes a cache line, and then a

different core reads it.

(b)Write-write dependency. A core writes a cache line, and then a different core writes it.

An interaction that does not cause data movement is two cores repeatedly reading a cache line that is not being written.

Thus if multiple cores only read a cache line and do not write it, then no memory bandwidth is consumed. Each core simply keeps its own copy of the cache line.

To minimize memory bus traffic, minimize core interactions by minimizing shared locations. Hence, the same patterns that tend to reduce lock contention also tend to reduce memory traffic, because it is the shared state that requires locks and generates contention.

Issue With Memory

Documents

Transcript of Issue With Memory