Implementation of Morton Layout for Large Arrays Presented by: Sharad Ratna Bajracharya Advisor:...

53
Implementation of Implementation of Morton Layout for Morton Layout for Large Arrays Large Arrays Presented by: Sharad Ratna Bajracharya Advisor: Prof. Larry Dunning 23 rd April 2004 owling Green State University
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of Implementation of Morton Layout for Large Arrays Presented by: Sharad Ratna Bajracharya Advisor:...

Implementation of Morton Implementation of Morton Layout for Large ArraysLayout for Large Arrays

Presented by: Sharad Ratna Bajracharya

Advisor: Prof. Larry Dunning

23rd April 2004

Bowling Green State University

OutlineOutline

• Introduction

• Objectives

• Implementation

• Samples

• Improvement

• Recommendation

• Conclusion

IntroductionIntroduction

• Morton Layout is used in two dimensional array.

• Performance of Morton Layout is comparatively better than row-major or column-major array representation.

Introduction Introduction continues...

• Reports on analysis of the Morton Layout for the performance and efficiency :– An exhaustive evaluation of row-major, column-major and

Morton Layouts for large two-dimensional arrays; Jeyarajan Thiyagalingam, Olav Beckman, Paul H. J. Kelly.

– Is Morton Layout competitive for large two-dimensional arrays?; Jeyarajan Thiyagalingam and Paul H. J. Kelly.

– Improving the Performance of Morton Layout by Array Alignment and Loop Unrolling; Jeyarajan Thiyagalingam, Olav Beckman, Paul H. J. Kelly.

Introduction Introduction continues...

• General Row Major Array Representation– Row major ordering assigns successive elements,

moving across the rows and then down the columns, to successive memory locations.

0 1 2 3 4 5 6 78 9 10 1112 13 14 15

Introduction Introduction continues...

• Column Major array representation.

0 4 8 12

1 5 9 13

2 6 10 14

3 7 11 15

Introduction Introduction continues...

• Morton layout is a compromise storage layout between the programming language mandated layouts such as row-major and column-major. 0 1 2 3 0 1 4 5 4 5 6 7 2 3 6 7 8 9 10 11 8 9 12 1312 13 14 15 10 11 14 15 (Row Major) (Morton Storage Layout)

Introduction Introduction continues...

• Morton storage layout works with almost equal overhead whether traversed row-wise or column-wise.

• Morton layout works fine with square two dimensional array, which size is power of 2 such as 2x2, 4x4, 8x8 etc.

Introduction Introduction continues...

• For non-square matrix, it waste lots of memory spaces.0 1 2 3 0 1 4 54 5 6 7 2 3 6 78 9 10 11 8 9 X X

10 11(Row Major) (Morton Storage Layout)

Introduction Introduction continues...

• How Morton Layout Works?– For any subscript of 2 dimensional array

such as array[ 2 , 3 ]Binary value of row 2 -> 1 0Binary value of col 3 -> 1 1

Morton Layout stores at 1 1 0 1 location, i.e. 13th memory location.

– Also known as Zip Fastening Array Layout.

Introduction Introduction continues...

• Consider row major large array1 2 3 4 5 6 7 …………………….10001001 1002 1003 1004 1005 1006 1007 ………………...20002001 2002 ……………………………………………………………………………………

.

.

.9001 9002 9003 9004 9005 9006 9007 ………………10000

. . . . . . . .• Result is cache miss, page faults and poor

performance.

ObjectivesObjectives

• Improve cache miss and page fault characteristics in Large Array using Morton Array Layouts.

• Reduce wasted memory in Morton layout.

• Improvement in extendibility of arrays.

ImplementationImplementation

• Interleaved bit patterns: 4 -> 0 1 0 0 -> 0 0 1 0 0 0 09 -> 1 0 0 1 -> 1 0 0 0 0 0 115 -> 1 1 1 1 -> 1 0 1 0 1 0 1

(Interleaved Bits)

Implementation Implementation continues

• Bit interleaved increment and decrement:– Bit interleaved increment:

101 + 1 -> 1 0 0 0 1 + 1110 -> 1 0 1 0 0(Changes are in interleaved bits)

– For any value “a”, bit interleaved increment is given by:a+1 = ((a | 0xAAAAAAAA) + 1) & 0x55555555

• 0xAAAAAAAA=1010……..10101010 (32 bits)• 0x55555555 = 0101…… .01010101 (32 bits)

Implementation Implementation continues

– Bit interleaved increment… a+1 = ((a | 0xAAAAAAAA) + 1) & 0x55555555

0 0 0 1 -> Bit interleaved 1 (0 1)OR 1 0 1 0

1 0 1 1+ 1

1 1 0 0AND 0 1 0 1

0 1 0 0 -> Bit interleaved 2 (1 0)

Implementation Implementation continues

– More examples of bit interleaved increment:0 0 0 0 0 + 1 = 0 0 0 0 1 0 0 0 0 1 + 1 = 0 0 1 0 1 0 0 1 0 1 + 1 = 1 0 0 0 0 1 0 0 0 0 + 1 = 1 0 0 0 1 1 0 0 0 1 + 1…

Implementation Implementation continues

– Bit interleaved Decrement:For example,1 0 0 - 1 -> 1 0 0 0 0 - 11 1 -> 0 0 1 0 1(Changes are in interleaved bits)

– For any value “a”, bit interleaved decrement is given by: a-1 = (a - 1) & 0x55555555Where,

• 0x55555555 = 0101……01010101 (32 bits)

Implementation Implementation continues

– Bit interleaved decrement… a-1 = (a -1) & 0x55555555

0 1 0 0 0 0 -> Bit interleaved 4 (100)

- 10 0 1 1 1 1

AND 0 1 0 1 0 10 0 0 1 0 1 -> Bit interleaved 3

(11)

Implementation Implementation continues

– More examples of bit interleaved decrement:…………...1 0 0 0 0 - 1 = 0 0 1 0 1 0 0 1 0 1 - 1 = 0 0 1 0 0 0 0 1 0 0 - 1 = 0 0 0 0 1 0 0 0 0 1 - 1 = 0 0 0 0 0

Implementation Implementation continues

• Morton Layout Array representation can be implemented in two ways:– First method is by maintaining lookup table

of bit interleaved array subscript for address calculation. For example,0 -> 0 0 0 01 -> 0 0 0 12 -> 0 1 0 03 -> 0 1 0 1

Implementation Implementation continues

– For example, any array subscript viz. [ 2 , 3 ]Value of 2 (1 0 ) from lookuptable -> 0100Value of 3 ( 1 1) from lookuptable -> 0101

To get the Morton layout address,ROW bitwise shift 1 + COL0100<<1 + 01011000+0101, that is, 1 0 0 0

+ 0 1 0 1 1 1 0 1 (zipped address)

Implementation Implementation continues

– Second Method to implement Morton Array Layout Representation is by only using bit interleaved increment and decrement without lookuptable.

Implementation Implementation continues

• Implemented in C++ as two dimensional array matrix class with Standard Template Library (STL) compatibility so as to make it generic, that is, it is not tied to any particular data structure or object type.

• Internally data are stored in STL vector sequentially.

Implementation Implementation continues

• Direct accessing the element of array matrix by using array subscript is implemented using lookup table.

• Random Iterators are defined which make use of bit interleaved increment and decrement without using lookup table.– Iterators are generalization of pointers. They

are objects that point to other objects.

Implementation Implementation continues

• Different types of random iterators are implemented to provide the flexibility in using the matrix class, such as,– Row Major iterator– Column Major iterator– Diagonal iterator– Row iterator / Super row iterator– Column iterator / Super column iterator– Reverse Row Major iterator

SamplesSamples

• Using Row Major Iterator:Sorted Data:-9 -9 -8 -8 -8 -8 -7 -6 -6 -5 -4 -4 -2 -2 -2 -1 1 1 2 3 5 5 6 7

Original Data:6 -9 -8 -1 -8 -6 -9 -2 -2 -5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1 7

//Row Major sorting using STL Sort()mat1=matori;cout<<mat1<<endl;sort(mat1.begin(), mat1.end());cout<<"Sorted Data:"<<endl;cout<<mat1<<endl;

Start

End

Samples Samples continues...

• Using Column Major iterator:Original Data:

6 -9 -8 -1 -8 -6 -9 -2 -2 -5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1 7

Sorted Data:

-9 -7 -2 2 -9 -6 -2 3 -8 -6 -2 5 -8 -5 -1 5 -8 -4 1 6 -8 -4 1 7

//Column Major sorting using STL Sort()mat1=matori;cout<<mat1<<endl;sort(mat1.cbegin(), mat1.cend());cout<<"Sorted Data:"<<endl;cout<<mat1<<endl;

Start

End

Samples Samples continues...

• Using super row iterator:Original Data:6 -9 -8 -1 -8 -6 -9 -2 -2 -5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1 7

Sorted Data:-9 -8 -1 6 -9 -8 -6 -2 -6 -5 -4 -2 -8 -4 2 3 -7 -2 1 5 -8 1 5 7

//Row by row sorting using STL Sort()mat1=matori;cout<<mat1<<endl;for(riter=mat1.r2rbegin();riter!=mat1.r2rend();riter++){sort((*riter).begin(), (*riter).end());}cout<<mat1<<endl;

Samples Samples continues...

• Using super column iterator:Original Data:6 -9 -8 -1 -8 -6 -9 -2 -2 -5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1 7

Sorted Data:-8 -9 -9 -8 -2 -8 -8 -4 -2 -6 -7 -2 2 -5 -6 -1 5 1 -4 5 6 3 1 7

//Column by column sorting using STL Sort()mat1=matori;cout<<mat1<<endl;for(citer=mat1.c2cbegin();citer!=mat1.c2cend();citer++){sort((*citer).begin(), (*citer).end());}cout<<mat1<<endl;

Samples Samples continues...

• Using Resize function:Original Data:6 -9 -8 -1 -8 -6 -9 -2 -2 -5 -6 -4 2 3 -4 -8 -2 1 -7 5 5 -8 1 7

Sorted Data:6 -9 -8 -1 0 0 -8 -6 -9 -2 0 0-2 -5 -6 -4 0 0 2 3 -4 -8 0 0 -2 1 -7 5 0 05 -8 1 7 0 00 0 0 0 0 00 0 0 0 0 0

//Resizing the matrixmat1=matori;cout<<mat1<<endl;

mat1.resize(8, 8, 0);cout<<mat1<<endl;

ImprovementImprovement

• Morton array representation can be improved if we can utilize the wasted spaces for non-square matrices.

• This can be achieved to some extent by using partial interleaved bit patterns.– Portion of bits are interleaved and

remaining bits are left as it is. This helps in utilizing the wasted space.

Improvement Improvement continues

• For example: Let us consider matrix of size 20 x 4 (actual reqd. space 80). Using Morton layout, it will require 1000001010 + 0000000101 = 1000001111=527+1 =528 spaces

With modified version, it will require1001010 + 0000101 = 1001111 = 79+1 = 80 spaces ->Improved !!!

Improvement Improvement continues

• More details… 1000001010 ->19 (row)+ 0000000101 -> 3 (col) 1000001111 ->527 (Morton location)

100001010 -> 19 (row)+ 000000101 -> 3 (col) 100001111 -> 79 (Improved Morton)

Extra interleaving bits removed

Improvement Improvement continues

• In the improved version, only N bits are interleaved where N is total no. of bits in the smallest of total “row-1” and “column-1” in row x column matrix.

• For example, in 20x4 matrix, the smallest no. is 4 and 4-1=3 which is “11” in binary, that is N=2 as 3 is represented by 2 bits “11”.

Improvement Improvement continues

• Interleaving N bits and leaving remaining bits. For example, for rows=20-1=19=10011 100 10 10 ->2 bits are interleaved

N=2 row interleaved bits.For columns=4-1=3=11000 01 01 -> 2 bits are interleaved

N=2 column interleaved bits.

Improvement Improvement continues

• Bit interleaved increment/decrement still works.– For bit interleaved Increment:

001 1010 -> Bit interleaved 7 (111)OR 000 0101 -> Bit Mask

001 1111+ 1

010 0000AND 111 1010 -> ~ Bit Mask (complement)

010 0000 -> Bit interleaved 8 (1000)

Improvement Improvement continues

– For bit interleaved Decrement:010 0000 -> Bit interleaved 8

(1000)- 1

001 1111AND 111 1010 -> ~ Bit Mask

001 1010 -> Bit interleaved 7 (111)

Improvement Improvement continues

• Improved array location is calculated by adding partial bit interleaved row and column.

100 10 10 -> 19+ 000 01 01 -> 3

100 11 11 = 79

• This method utilizes the wasted space to some extent but it does not work better than original Morton layout for square matrix which are not power of 2.

Improvement Improvement continues

• Improvement for square matrices:– Lets consider matrix NxN and say we want n

bits to be interleaved. There is no change in the remaining bits of column bit patterns but for row bit patterns, remaining bits will have special bit patterns which are multiple of N/2n . So, separate lookuptables are required for row and column bit patterns.

– Row bit and column bit patterns are added to get the modified storage location.

Improvement Improvement continues

• For example, 17x17 matrix with n=2 interleaved bits (actual 289 spaces reqd.):– Space required by normal Morton Layout will be

1000000000+ 0100000000=1100000000 =768+1=769

– With Improved version, we have, 17/22 =5Row Lookuptable Col Lookuptable0000 0000 0 0000 00000000 0010 1 0000 00010000 1000 2 0000 01000000 1010 3 0000 01010101 0000 4 0001 00000101 0010… 5... 0001 0001...

Changed by 5 = 101

Improvement Improvement continues

• For 17x17 matrix,– 16 from row lookuptable will be,

10100 0000– 16 from col lookuptable will be,

00100 0000– Total space required will be,

10100 0000+ 00100 0000 Improved!!! 11000 0000 -> 384 + 1=385 spaces reqd.

Improvement Improvement continues

• This technique used for the square matrix still leaves some extra space as shown in the example of 17x17 matrix. In some cases, it even works perfectly. However its an improvement over Morton layout for square matrices which are not power of 2.

Improvement Improvement continues

• Generalized improvement for both square and non-square matrices:– Each row and column have respective

partially interleaved bit patterns.– Either row or column whichever is greater,

will have some non-interleaved and some special bit patterns.

– Different lookup tables for rows and columns are required to implement.

Improvement Improvement continues

– Let’s consider matrix of RxC with n interleaved bits then r= R/2n and c= C/2n

– If r>c, row will have i regular non-interleaved bits and some special bit patterns of multiple of j, or vice versa.

– If r>c:For Row:

For Column:Multiple of j bit pattern i regular remaining bits n interleaved bits

n interleaved bitsRemaining bits <not used>

Improvement Improvement continues

– For r>c, i abs(r - cx2i) is the least where i =1, 2, 3,.…..j = MAX(r/2i, c)

– For c>r,i abs(c - rx2i) is the least where i =1, 2, 3,.…..j = MAX(r, c/2i)

Improvement Improvement continues

– For example, consider 70x13 matrix with n=2 interleaved bits (actually 910 spaces required). Space required by normal Morton Layout will be,10000000100010 + 00000001010000= 10000001110010=8306+1=8307Here,R=70, C=13, r= 70/22 and c= 13/22 We have, r>c,When i=1, abs(r - cx21)=10When i=2, abs(r - cx22)=2When i=3, abs(r - cx23)=14 i=2 (only used by row in this case) j= MAX(r/22, c)=5

Improvement Improvement continues

– Row Lookuptable Col Lookuptable00000 00 0000 0 00000 00 000000000 00 0010 1 00000 00 000100000 00 1000 2 00000 00 010000000 00 1010 3 00000 00 010100000 01 0000 4 00001 00 000000000 01 0010… 5... 00001 00 0001…………00000 11 1010 15 00101 00 0000... 16

Changed by 5 = 101

Only used by Rowbecause row > col

Improvement Improvement continues

• For 70x13 matrix,– 69 from row lookuptable will be,

10100 01 0010– 12 from col lookuptable will be,

00011 00 0000– Total space required will be,

10100 01 0010 + 00011 00 0000 Improved!!! 10111 01 0010 -> 1490 + 1=1491 spaces

RecommendationsRecommendations

• Devise more efficient algorithms to utilize the wasted spaces by Morton Array Layout.

• If an optimal compromised algorithm is devised which works with both non-square and square matrices, then it could be new research paper or graduate research project.

ConclusionConclusion

• Morton Array Layout and its variant to improve the wasted spaces by Morton Layout was implemented in C++.

• Improvements on Morton Layout such as improvement for non-square and square matrices was introduced.

• But still optimal algorithm is to be researched.

Conclusion Conclusion continues

• C++ header file of Morton Array Layout matrix class can be downloaded and evaluated from: http://www.sharad.info/cs691

• For any defects or feedback regarding this header file, please email me at [email protected]

Any Questions ?Any Questions ?

Thank You !Thank You !