Optimization of Arithmetic Coding By Kolluru Krishna Bharath.
-
Upload
virgil-hicks -
Category
Documents
-
view
215 -
download
0
Transcript of Optimization of Arithmetic Coding By Kolluru Krishna Bharath.
Optimization of Optimization of Arithmetic CodingArithmetic Coding
ByKolluru Krishna Bharath
OutlineOutline
ObjectiveMotivationOptimizations w.r.t platforms
◦PredicationOptimizations w.r.t algorithm (Arithmetic
Coding)◦Sequential AC◦Parallel AC
Conclusion
ObjectiveObjective
To study the performance of the algorithm on different platforms.
To optimize the algorithm to achieve better performance.
MotivationMotivation
MQ CodingMQ Coding
MQ Coding:1. Resizing of the interval to eliminate the need for high precision for range calculation2. Adaptive Probability(MPS) calculation (requires only one pass)3. Integer Arithmetic .
Machine specific optimizationsMachine specific optimizations
Compilers take advantage of the architecture underneath.
Examples of machine specific optimizations are ◦Predication◦Software pipelining
ARM core supports Predication.
PredicationPredication
Objective◦Eliminate hard-to-predict branches.◦Increase ILP.
Advantage of using ARM:◦Supports predication(conditional codes) and◦Gives the option of setting the flag for every
arithmetic and logical instruction.
Predication-ExamplesPredication-Examples
UPDATING THE COUNT LDRB R4,[R1] ;R4 HAS THE SYMBOL. LDRB R5,[R2] ;R5 HAS THE COUNT OF NUMBER OF ZEROS. LDRB R6,[R3] ;R6 HAS THE COUNT OF NUMBER OF ONES. CMP R4,#0 ;CHECK IF THE VALUE THAT IS READ FROM THE SOURCE IS 1/0. ADDNE R5,R5,#1 ;IF ZERO, ADD 1 TO THE COUNT OF COUNT_0. ADDEQ R6,R6,#1 ;IF ONE, ADD 1 TO THE COUNT OF COUNT_1. STRB R5,[R2] ;THE COUNT_1 HAS BEEN UPDATED WITH THE NEW VALUE. STRB R6,[R3] ;THE COUNT_0 HAS BEEN UPDATED WITH THE NEW VALUE.
EVALUATING THE MPS AND THE LPS. CMP R5,R6 ;CHECK IF THE COUNT_0>COUNT_1. MOVGE R4,#1 ;IF YES, MOVE 1 INTO R4. MOVLT R4,#0 ;IF NO, MOVE 0 INTO R4. LDR R0,=(MPS) STRB R4,[R0]
IF NO PREDICATION ; THE CODE WOULD LOOK LIKE LDR R0,=(MPS) CMP R5,R6 BGE LOOP1 ; HIGHLY UNPREDICTABLE BRANCH. MOV R4,#1 STRB R4,[R0] B EXIT LOOP1: MOV R4,#0 STRB R4,[R0] EXIT
Predication - ContinuedPredication - Continued
What is the advantage of using predication for MQ coding?◦The algorithm has small sized loops &◦The branches are highly unpredictable.
This favors predication.Performance Analysis shows that using
Predication, we get a fractional speed up of 2.75 on replacing a conditional branch instruction.
Predication – Continued Predication – Continued
Can predication be used for all algorithms?
No. Certain characteristics are required which best suit the usage of predication, such as◦Highly unpredictable branches◦Small loops (preferable), otherwise the cost of
executing both direction could be more the cost of misprediction.
Optimization of the AlgorithmOptimization of the Algorithm
Sequential AC-Data Flow DiagramSequential AC-Data Flow Diagram
BeginL=0;L=1;F(0)=0;for(j=1 to N) { i=index_of_symbol(j); L(j+1)=L(j)+(H(j)-L(j))*F(i-1); H(j+1)=L(j)+(H(j)-L(j))*F(i); }output((L+H)/2);end
Dependence Matrix is given by[ 1 1]
Sequential AC-Dependence GraphSequential AC-Dependence Graph
1 Dimension Loop
2 Dimensional Loop
1. Inner loop & outer loop parallelism are absent.
2. Loop interchange doesn’t help.
J
I
I
Parallel AC- Data Flow GraphParallel AC- Data Flow Graph
Parallel AC – Dependence GraphParallel AC – Dependence Graph
Do all i=1 to 2Do j=1 to 2
{l=index_of_symbol(j,i);L(j,i)=L(j,i)+(H(j,i)-L(j,i))*F(l-1); H(j,i)=L(j,i)+(H(j,i)-L(j,i))*F(l); }
EnddoEnddoall
L_final = L12 + (H12-L12)*L22;H_final = L12 + (H12-L12)*H22;
I
J
I
J
Dependence Graph for the code Figure(1).
Figure(1)Dependence graph for Parallel AC
Dependence Matrix for Figure(1)
Performance – Arithmetic CodingPerformance – Arithmetic Coding
The parallel Arithmetic Coding for a text message of length 1800 showed the follow speed up◦4.875 (without the overhead of loading the
values into separate processors)◦1.66 ( with the overhead)
For a text message of length ~10000, parallel showed a speedup of 2(with overhead).
ConclusionConclusion
Running the MQ coder on ARM core improves the performance of the algorithm.
Tuning the AC for parallel execution provides a very good performance .
Thank youThank you
Questions ?Questions ?
Backup SlidesBackup Slides
PredicationPredication
Predication◦Performance Analysis shows that using
Predication, we get a fractional speed up of 2.75 on replacing a conditional branch instruction, i.e. a reduction from 0.264us to 0.096us for a clock frequency of 41MHz (24ns). Each time a symbol (1/0) is encoded we save 7 cycles(i.e. 7 cycles/run) for every predicated instruction used instead of branch instruction. When this is executed, say, on a black & white image of size 256x256, we save ~0.5M cycles.
Performance – Arithmetic CodingPerformance – Arithmetic Coding
The sequential and parallel Arithmetic Coding for the same testbench show dramatic change in the execution time◦Sequential – 0.078seconds◦Parallel – 0.016seconds
ReferencesReferences
Howard & Vitter, “Arithmetic Coding for Data Compression”.
David Sehr, Jay Bharadwaj, Jim Pierce, Priti Shrivastav, Carole Dulong, “IA-64 Compiler Technology”.
Utpal Bannerjee, “Loop Parallelization”.Pierre Boulet,Darte and Silber, “Loop
parallelization Algorithms: from parallelism extraction to code generation”.
Supol and Melichar, “Arithmetic Coding in Parallel”.