Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

23
Optimization of Optimization of Arithmetic Coding Arithmetic Coding By Kolluru Krishna Bharath

Transcript of Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

Page 1: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

Optimization of Optimization of Arithmetic CodingArithmetic Coding

ByKolluru Krishna Bharath

Page 2: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

OutlineOutline

ObjectiveMotivationOptimizations w.r.t platforms

◦PredicationOptimizations w.r.t algorithm (Arithmetic

Coding)◦Sequential AC◦Parallel AC

Conclusion

Page 3: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

ObjectiveObjective

To study the performance of the algorithm on different platforms.

To optimize the algorithm to achieve better performance.

Page 4: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

MotivationMotivation

Page 5: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

MQ CodingMQ Coding

MQ Coding:1. Resizing of the interval to eliminate the need for high precision for range calculation2. Adaptive Probability(MPS) calculation (requires only one pass)3. Integer Arithmetic .

Page 6: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

Machine specific optimizationsMachine specific optimizations

Compilers take advantage of the architecture underneath.

Examples of machine specific optimizations are ◦Predication◦Software pipelining

ARM core supports Predication.

Page 7: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

PredicationPredication

Objective◦Eliminate hard-to-predict branches.◦Increase ILP.

Advantage of using ARM:◦Supports predication(conditional codes) and◦Gives the option of setting the flag for every

arithmetic and logical instruction.

Page 8: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

Predication-ExamplesPredication-Examples

UPDATING THE COUNT LDRB R4,[R1] ;R4 HAS THE SYMBOL. LDRB R5,[R2] ;R5 HAS THE COUNT OF NUMBER OF ZEROS. LDRB R6,[R3] ;R6 HAS THE COUNT OF NUMBER OF ONES. CMP R4,#0 ;CHECK IF THE VALUE THAT IS READ FROM THE SOURCE IS 1/0. ADDNE R5,R5,#1 ;IF ZERO, ADD 1 TO THE COUNT OF COUNT_0. ADDEQ R6,R6,#1 ;IF ONE, ADD 1 TO THE COUNT OF COUNT_1. STRB R5,[R2] ;THE COUNT_1 HAS BEEN UPDATED WITH THE NEW VALUE. STRB R6,[R3] ;THE COUNT_0 HAS BEEN UPDATED WITH THE NEW VALUE.

EVALUATING THE MPS AND THE LPS. CMP R5,R6 ;CHECK IF THE COUNT_0>COUNT_1. MOVGE R4,#1 ;IF YES, MOVE 1 INTO R4. MOVLT R4,#0 ;IF NO, MOVE 0 INTO R4. LDR R0,=(MPS) STRB R4,[R0]

IF NO PREDICATION ; THE CODE WOULD LOOK LIKE LDR R0,=(MPS) CMP R5,R6 BGE LOOP1 ; HIGHLY UNPREDICTABLE BRANCH. MOV R4,#1 STRB R4,[R0] B EXIT LOOP1: MOV R4,#0 STRB R4,[R0] EXIT

Page 9: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

Predication - ContinuedPredication - Continued

What is the advantage of using predication for MQ coding?◦The algorithm has small sized loops &◦The branches are highly unpredictable.

This favors predication.Performance Analysis shows that using

Predication, we get a fractional speed up of 2.75 on replacing a conditional branch instruction.

Page 10: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

Predication – Continued Predication – Continued

Can predication be used for all algorithms?

No. Certain characteristics are required which best suit the usage of predication, such as◦Highly unpredictable branches◦Small loops (preferable), otherwise the cost of

executing both direction could be more the cost of misprediction.

Page 11: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

Optimization of the AlgorithmOptimization of the Algorithm

Page 12: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

Sequential AC-Data Flow DiagramSequential AC-Data Flow Diagram

BeginL=0;L=1;F(0)=0;for(j=1 to N) { i=index_of_symbol(j); L(j+1)=L(j)+(H(j)-L(j))*F(i-1); H(j+1)=L(j)+(H(j)-L(j))*F(i); }output((L+H)/2);end

Dependence Matrix is given by[ 1 1]

Page 13: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

Sequential AC-Dependence GraphSequential AC-Dependence Graph

1 Dimension Loop

2 Dimensional Loop

1. Inner loop & outer loop parallelism are absent.

2. Loop interchange doesn’t help.

J

I

I

Page 14: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

Parallel AC- Data Flow GraphParallel AC- Data Flow Graph

Page 15: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

Parallel AC – Dependence GraphParallel AC – Dependence Graph

Do all i=1 to 2Do j=1 to 2

{l=index_of_symbol(j,i);L(j,i)=L(j,i)+(H(j,i)-L(j,i))*F(l-1); H(j,i)=L(j,i)+(H(j,i)-L(j,i))*F(l); }

EnddoEnddoall

L_final = L12 + (H12-L12)*L22;H_final = L12 + (H12-L12)*H22;

I

J

I

J

Dependence Graph for the code Figure(1).

Figure(1)Dependence graph for Parallel AC

Dependence Matrix for Figure(1)

Page 16: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

Performance – Arithmetic CodingPerformance – Arithmetic Coding

The parallel Arithmetic Coding for a text message of length 1800 showed the follow speed up◦4.875 (without the overhead of loading the

values into separate processors)◦1.66 ( with the overhead)

For a text message of length ~10000, parallel showed a speedup of 2(with overhead).

Page 17: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

ConclusionConclusion

Running the MQ coder on ARM core improves the performance of the algorithm.

Tuning the AC for parallel execution provides a very good performance .

Page 18: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

Thank youThank you

Page 19: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

Questions ?Questions ?

Page 20: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

Backup SlidesBackup Slides

Page 21: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

PredicationPredication

Predication◦Performance Analysis shows that using

Predication, we get a fractional speed up of 2.75 on replacing a conditional branch instruction, i.e. a reduction from 0.264us to 0.096us for a clock frequency of 41MHz (24ns). Each time a symbol (1/0) is encoded we save 7 cycles(i.e. 7 cycles/run) for every predicated instruction used instead of branch instruction. When this is executed, say, on a black & white image of size 256x256, we save ~0.5M cycles.

Page 22: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

Performance – Arithmetic CodingPerformance – Arithmetic Coding

The sequential and parallel Arithmetic Coding for the same testbench show dramatic change in the execution time◦Sequential – 0.078seconds◦Parallel – 0.016seconds

Page 23: Optimization of Arithmetic Coding By Kolluru Krishna Bharath.

ReferencesReferences

Howard & Vitter, “Arithmetic Coding for Data Compression”.

David Sehr, Jay Bharadwaj, Jim Pierce, Priti Shrivastav, Carole Dulong, “IA-64 Compiler Technology”.

Utpal Bannerjee, “Loop Parallelization”.Pierre Boulet,Darte and Silber, “Loop

parallelization Algorithms: from parallelism extraction to code generation”.

Supol and Melichar, “Arithmetic Coding in Parallel”.