VIIII SEMINARIO INTERNACIONAL Y I ENCUENTRO RECREODEPORTIVO CULTURA FÍSICA Y DEPORTE
Data Compression Practicals VIIII IT
-
Upload
samyak-mayank-shah -
Category
Documents
-
view
200 -
download
5
Transcript of Data Compression Practicals VIIII IT
LJIET Data Compression VIII - IT
Page 1
L. J. Institute of Engineering & Technology S.G. Highway, Ahmedabad-382210
CE/IT Department Practical List
Subject Name: Data Compression Term Duration-
Subject Code: From: 2-Jan-2012
Branch & Sem: VIII - IT To: 28-Apr-2012
SR. NO. SUBJECT TEACHING SCHEME (HOURS) CREDITS
THEORY TUTORIAL PRACT.
1 Data Compression 4 0 2 6
Sr.No.
Aim Of Practical Practical Week
Submission Week
1.
Write a program to count the occurrences of different letters by
reading the given text file and also find the probability of each letter
with number of bits required for them using the formula: No. of
bits=1/log2 probi
9-Jan-2012 -
14-Jan-2012
16-Jan-2012 -
21-Jan-2012
2. Write a program in C to determine whether the set of given codes is
uniquely decodable or not.
16-Jan-2012 -
21-Jan-2012
23-Jan-2012 -
28-Jan-2012
3. Study of Huffman Compression Algorithm 23-Jan-2012 -
28-Jan-2012
30-Jan-2012 -
4-Feb-2012
4. Study of Shannon-Fano compression Algorithm 30-Jan-2012 -
4-Feb-2012
6-Feb-2012 -
11-Feb-2012
5. Write a program to implement arithmetic coding 6-Feb-2012 to
11-Feb-2012
13-Feb-2012 -
18-Feb-2012
6. Write a program to implement lz77 algorithm.
12-Mar-2012 -
17-Mar-2012 19-Mar-2012 -
17-Mar-2012
7. Write a program to implement lz78 algorithm.
19-Mar-2012 -
17-Mar-2012
26-Mar-2012 -
31-Mar-2012
8. Write a program to implement lzss algorithm.
26-Mar-2012 -
31-Mar-2012 2-Apr-2012 -
7-Apr-2012
9.
Study of Speech Compression.
2-Apr-2012 to
7-Apr-2012
9-Apr-2012-
14-Apr-2012
Final
Submission
Faculty Name : Ms. Seema Mahajan
LJIET Data Compression VIII - IT
Page 2
PRACTICAL 1
AIM: Write a program to count the occurrences of different letters by
reading the given text file and also find the probability of each letter
with number of bits required for them using the formula: No. of
bits=1/log2 probi
Description of Algorithm
Open a text file in read mode.
Read the file character by character and store it in temporarily and compare it
with defined set of characters, calculate the occurrences of all characters.
Find the probability of occurrence of each character in the file using the equation:
Probability =No. of occurrences of a character/Total No. of characters.
Find the number of bits required to encode each letter:
Bits = ((log (Probi ))/log2)*(-1)
Find total number of bits required to encode the file
Example:
Contents of text file:
aabbccd
Probability of each character and number of bits required:
a=0.2857 b=0.2857 c=0.2857 d=0.1428
No of Bits Required After Compression for Each Character
a=2 b=2 c=2 d=3
Total Number of Bits Required After Compression:
(2*2) + (2*2) + (2*2) + (3*1) = 1
Source Code
#include<stdio.h>
LJIET Data Compression VIII - IT
Page 3
#include<math.h>
void main()
{
float nofchar=0,prob[26];
float y[26],s[26],x=0;
FILE *fp;
int b[26],j=0,h=0;
char I;
a[27]={‘a’,’b’,’c’,’d’,’e’,’f’,’g’,’h’,’i’,’j’,’k’,’l’,’m’,’n’,’o’,’p’,’q’,’r’,’s’,’t’,’u’,’v’,’w’,’x’
,’y’,’z’};
clrscr();
fp=fopen("c:\\tc\\bin\\text,txt","r+");
fseek(fp,0,0);
for(j=0;j<=25;j++)
{
b[j]=0;
}
printf("Data of the File : ");
while(!feof(fp))
{
i=getc(fp); //Reading Characters of File
printf("%c",i);
for(j=0;j<=25;j++)
{ //Comparing Characters With Defined Set Characters
if(i==a[j] || i==(a[j]-32))
{
b[j]++;
h++;
}
}
}
printf("\nNo. of Bits Required Before Compression:%d\n\n",(h*7));
for(j=0;j<=25;j++)
{
nofchar=nofchar+b[j];
}
printf("Probability Of Each Character :\n\n");
//Calaculating Proability of Each Character
for(j=0;j<=25;j++)
{
prob[j]=b[j]/nofchar;
printf("%c = %f\t",a[j],prob[j]);
}
LJIET Data Compression VIII - IT
Page 4
printf("\n\n");
printf("Bits Required per character : \n\n");
//Calculating Bits Required for Each Character
for(j=0;j<=25;j++)
{
if(prob[j]==0)
{
s[j]=0;
y[j]=0;
continue;
}
y[j]=((log(prob[j])/log(2)));
y[j]=ceil((-1)*y[j]);
s[j]=y[j]*b[j];
//Calculating Total Bits Required After Compression
x=x+s[j];
printf("%c = %f\t",a[j],y[j]);
}
printf("\n\nNo of Bits Required After Compression: ");
printf("%f",x);
printf("\n\nCompression Ratio = %f",((h*7)/x));
getch();
}
Advantage:
It will be useful to find how much compression will be done by encoding file by
value of Compression Ratio.
Disadvantage:
Because of large number of loops and arrays, memory usage is more.
Limitation:
If there are characters other than alphabets, then it does not support them.
LJIET Data Compression VIII - IT
Page 5
Input:
Text file with contents:
lossless
Output
Data of the File: lossless
No. of Bits Required Before Compression: 56
Probability of Each Character:
a = 0.000000 b = 0.000000 c = 0.000000 d = 0.000000 e = 0.125000
f = 0.000000 g = 0.000000 h = 0.000000 i = 0.000000 j = 0.000000
k = 0.000000 l = 0.250000 m = 0.000000 n = 0.000000 o = 0.125000
p = 0.000000 q = 0.000000 r = 0.000000 s = 0.500000 t = 0.000000
u = 0.000000 v = 0.000000 w = 0.000000 x = 0.000000 y = 0.000000
z = 0.000000
Bits Required per Character:
e = 3.000000 l = 2.000000 o = 3.000000 s = 1.000000
No of Bits Required After Compression: 14.000000
Compression Ratio = 4.000000
LJIET Data Compression VIII - IT
Page 6
PRACTICAL 2
AIM: Write a program in C to determine whether the set of given codes
is uniquely decodable or not.
Description of algorithim:
Get the set of codes for which uniquely decidability has to be determined.
Suppose we have two code words:
o A having length of k bits
o B having length of n bits where n>k.
If 1st k bits of symbol B are identical to k bits of symbol A then remaining bits
are dangling suffix
Calculate the dangling suffix for the given set of codes.
If the dangling suffix is itself a codeword for a symbol then the code is not a
uniquely decodable code. Else the given codeword is a uniquely decodable code.
LJIET Data Compression VIII - IT
Page 7
Code for the practical:
#include<stdio.h>
#include<conio.h>
#include<string.h>
void main()
{
int i,n,j,k=0,si,sj,a,b,c;
char *data[10],temp[10],dang[10],x[2];
clrscr();
printf("Enter the no of elements:");
scanf("%d",&n);
for(i=0;i<n;i++)
{
printf("\n%d.",i+1);
fflush(stdin);
gets(temp);
data[i]=(char *)malloc(sizeof(temp));
strcpy(data[i],temp);
}
k=i;
printf("\n");
for(i=0;i<n;i++)
{
for(j=1;j<n;j++)
{
si=strlen(data[i]);
sj=strlen(data[j]);
if(si<sj && j!=i)
{
strcpy(temp,"");
for(a=(strspn(data[i],data[j]));a<sj;a++)
{
x[0]=*(data[j]+a);
x[1]='\0';
strcat(temp,x);
}
data[k]=(char *)malloc(strlen(temp));
strcpy(data[k],temp);
k++;
for(c=n-1;c<=k-1;c++)
{
for(b=n;b<=k-1;b++)
LJIET Data Compression VIII - IT
Page 8
{
if(c!=b &&
(strlen(data[c])==strlen(data[b])))
{
if(strncmp(data[c],data[b],strlen(data[c]))==0)
k--;
}
}
}
}
if(sj<si && i!=j)
{
strcpy(temp,"");
for(a=(strspn(data[j],data[i]));a<si;a++)
{
x[0]=*(data[i]+a);
x[1]='\0';
strcat(temp,x);
}
data[k]=(char *)malloc(strlen(temp));
strcpy(data[k],temp);
k++;
for(c=n-1;c<=k-1;c++)
{
for(b=n;b<=k;b++)
{
if(c!=b &&
(strlen(data[c])==strlen(data[b])) && strncmp(data[c],data[b],strlen(data[c]))==0)
{
k--;
}
}
}
}
}
}
printf("\nThe new Array is:\n");
for(i=0;i<=k-1;i++)
{
printf("\n%s",data[i]);
}
for(i=0;i<=k-1;i++)
{
for(j=i+1;j<=k-1;j++)
LJIET Data Compression VIII - IT
Page 9
{
if(strlen(data[i])==strlen(data[j]))
{
if(strncmp(data[i],data[j],strlen(data[i]))==0)
{
printf("\nCode is not uniquely decodable");
goto end;
}
}
}
}
printf("\nThe code is uniquely decodable");
end:
getch();
}
Input/Output
Enter the no of elements:3
1.1
2.10
3.100
The new Array is:
1
10
100
0
00
The code is uniquely decodable
Output 2:
Enter the no of elements:3
1.1
2.10
3.101
LJIET Data Compression VIII - IT
Page 10
The new Array is:
1
10
101
0
01
1
Code is not uniquely decodable
ADVANTAGES:
We can determine whether the given code is uniquely decodable or not.
LJIET Data Compression VIII - IT
Page 11
PRACTICAL 3
AIM: Study of Huffman Compression Algorithm
Description:
Huffman coding is creates variable length codes that are integral number of bits.
Symbols with higher probabilities get shorter codes. Huffman codes have the unique
prefix attribute, Huffman decoding procedure is shown as below
Huffman are built from Bottom up approach means starting with leaves of tree and
reach to the root node.
Bottom up approach:-
According to the figures, in the first figure we can see that only two nodes arranged
contains weight 3 & 4 from right to left and those nodes are the leaves of the binary
Huffman tree and tree is drawn up to the root of the tree as shown in figure no 2 & figure
no 3.
Fig no 1 Fig no 2 Fig no 3
Steps for building the tree:-
1. The two free nodes with the lowest weight are located.
2. Parent node for these two nodes are created it is assigned a weight equal to the
total weight of the two child nodes.
3. Now the parent node is added to the list of free nodes and also removes the two
child nodes from free node list.
4. One of the child node is designated as decoding as 0 bit and another set to 1
5. Steps from 1 to 4 repeat until only one free node is left.
Example:
LJIET Data Compression VIII - IT
Page 12
Suppose five symbols are laid down along with there frequencies as shown below
28 6 11 17 31
A B C D E
Solution Steps:
Nodes are removed from the free list and parent node is added to the free list and
repeat the procedure from step 1 to 4 so that we can get below Huffman tree.
To determine the code for a given symbol, we have to walk from the leaf node to
root node at the Huffman tree, unfortunately the bits are written in reverse order that
we want, that means we have to push bits on to the stack then pop of the generated
code.
Study of Huffman code:
E (31) A (28) D (17) C (11) F (7) B (6)
0 1
13
Step 1
E (31) A (28) D (17) C (11) F (7) B (6)
0 1
13
Step 2
0 1
24
E (31) A (28) D (17) C (11) F (7) B (6)
0 1
13
Step 3
0 1
24
0 1
41
E (31) A (28) D (17) C (11) F (7) B (6)
0 1
13
Step 4
0 1
24
0 1
41
0 1
59
B (6)
Step 5
Final codes
A
B
C
D
E
F
01
00
10
110
1110
1111
E (31) A (28) D (17) C (11) F (7)
0 1
13
0 1
24
0 1
41
0 1
59
0 1
100
LJIET Data Compression VIII - IT
Page 13
1. Data structure used in Huffman tree
typedef struct tree_node {
unsigned int count;
unsigned int saved_count;
int child_0;
int child_1;
} NODE;
Each node in Huffman tree has several information like count has value of weight
associated with it.
saved_count has the value when the nodes are taken off from the active list then its count
is set to 0, count before setting are saved on saved_count.
Child_0 & Child_1 are used to point to the node which is its child node. 1 is child_0
which has value 0 and another is child_1 which has value 1.
Functions:-
1. count_bytes() :-
This function counts the every occurrences of the character from start of the file to the
end. And also it calculates the position of the pointer is saved when the count starts and
restore when it is done.
2. Outout_counts():-
This means that I store runs of counts, until all the non-zero counts have been stored. At
this time the list is terminated by storing a start value of 0. Note that at least 1 run of
counts has to be stored, so even if the first start value is 0, I read it in. It also means that
even in an empty file that has no counts, I have to pass at least one count. In order to
efficiently use this format, I have to identify runs of non-zero counts. Because of the
format used, I don't want to stop a run because of just one or two zeros in the count
stream. So I have to sit in a loop looking for strings of three or more zero values in a
row. This is simple in concept, but it ends up being one of the most complicated routines
in the whole program. A routine that just writes out 256 values without attempting to
optimize would be much simpler, but would hurt compression quite a bit on small files.
3. Input_counts():-
When expanding, I have to read in the same set of counts. This is quite a bit easier that
the process of writing them out, since no decision making needs to be done. All I do is
read in first, check to see if I am all done, and if not, read in last and a string of counts.
4. build_tree():-
LJIET Data Compression VIII - IT
Page 14
Build the Huffman tree after counts have been loaded.
This function finds the two minimum weighted node or lowest frequency symbol.
At the starting all the 257 nodes have a count value set to that frequency counts and a non
zero value this means that the node is active node. After finding two lowest freq. node
calculate the parent node and its weight which is equals to total weight of two nodes.
Now it set this two nodes as set ot 0 means this two nodes are now inactive so the next
time two another lowest freq. nodes will be taken.
5. expand_node():-
starting at the root node, a single bit at a time is read in by the decoder, if the bit is 0, the
next node is one pointed to the child_0 index otherwise pointed to the child_1 index.
If the symbol has the specific end of stream symbol, we can exit instead of sending it.
6. void scale_counts( counts, nodes ):-
unsigned long *counts;
NODE *nodes;
In order to limit the size of my Huffman codes to 16 bits, I scale my counts down so they
fit in an unsigned char, and then store them all as initial weights in my NODE array.
The only thing to be careful of is to make sure that a node with a non-zero count doesn't
get scaled down to 0. Nodes with values of 0 don't get codes.
7. int build_tree( nodes ):-
NODE *nodes;
Building the Huffman tree is fairly simple. All of the active nodes are scanned in order to
locate the two nodes with the minimum weights. These two weights are added together
and assigned to a new node. The new node makes the two minimum nodes into its 0
child and 1 child. The two minimum nodes are then marked as inactive. This process
repeats until their is only one node left, which is the root node. The tree is done, and the
root node is passed back to the calling routine. Node 513 is used here to arbitrarily
provide a node with a guaranteed maximum value. It starts off being min_1 and min_2.
After all active nodes have been scanned, I can tell if there is only one active node left by
checking to see if min_1 is still 513.
8. void convert_tree_to_code( nodes, codes, code_so_far, bits, node ):-
NODE *nodes;
CODE *codes;
unsigned int code_so_far;
int bits;
int node;
Since the Huffman tree is built as a decoding tree, there is no simple way to get the
encoding values for each symbol out of it. This routine recursively walks through the
LJIET Data Compression VIII - IT
Page 15
tree, adding the child bits to each code until it gets to a leaf. When it gets to a leaf, it
stores the code value in the CODE element, and returns.
9. void print_model( nodes, codes ):-
NODE *nodes;
CODE *codes;
If the -d command line option is specified, this routine is called to print out some of the
model information after the tree is built. Note that this is the only place that the
saved_count NODE element is used for anything at all, and in this case it is just
fordiagnostic information. By the time I get here, and the tree has been built, every
active element will have 0 in its count.
10. void print_char( c ):-
int c;
The print_model routine uses this function to print out node numbers.The catch is, if it is
a printable character, it gets printed out as a character. Makes the debug output a little
easier to read.
11. void compress_data( input, output, codes ):-
FILE *input;
BIT_FILE *output;
CODE *codes;
Once the tree gets built, and the CODE table is built, compressing the data is a breeze.
Each byte is read in, and its corresponding Huffman code is sent out.
12. void expand_data( input, output, nodes, root_node ):-
BIT_FILE *input;
FILE *output;
NODE *nodes;
int root_node;
Expanding compressed data is a little harder than the compression phase. As each new
symbol is decoded, the tree is traversed, starting at the root node, reading a bit in, and
taking either the child_0 or child_1 path. Eventually, the tree winds down to a leaf node,
and the corresponding symbol is output. If the symbol is the END_OF_STREAM
symbol, it doesn't get written out, and instead the whole process terminates.
Source code :
#include<stdio.h>
LJIET Data Compression VIII - IT
Page 16
#include<iostream.h>
#include<conio.h>
#include<string.h>
#include<math.h>
float sum=0;
typedef struct node
{
float prob;
node *lptr;
node *rptr;
int value;
int check;
}node;
class Huffman
{
char arr[20], sortArr[20];
float prob[20], sortProb[20];
int noofbits[20];
int comp[20][10], counter[20], nodeCount;
int prev[10], symbol[10], i, k;
public:
Huffman();
void getData();
void probability();
void sort();
void compress();
void inorder(node *);
};
Huffman:: Huffman()
{
int m, n;
for(m=0;m<20;m++)
{
counter[m]=0;
sortArr[m]=0;
}
i=0;
k=0;
getData();
}
LJIET Data Compression VIII - IT
Page 17
void Huffman :: inorder(node *root)
{
if(root)
{
prev[i]=0;
if(root->value!=36)
{
symbol[k]=(int)root->value;
printf(": %c :",root->value);
for(int j=0;j<i;j++)
{
comp[k][j]=prev[j];
}
//cout<<"\n";
for(int m=0;m<i;m++)
{
cout<<comp[k][m];
}
k++;
i--;
}
else
i++;
inorder(root->rptr);
prev[i]=1;
if(root->value!=36)
{
symbol[k]=root->value;
// cout<<root->value;
// printf(": %c :",root->value);
for(int j=0;j<i;j++)
{
comp[k][j]=prev[j];
}
cout<<"\n";
for(int m=0;m<i;m++)
{
cout<<comp[k][m];
}
k++;
i=0;
}
else
i++;
LJIET Data Compression VIII - IT
Page 18
inorder(root->lptr);
}
}
void Huffman :: getData()
{
int len;
clrscr();
cout<<"Enter any string : ";
cin>>arr;
len=strlen(arr);
probability();
}
void Huffman:: probability()
{
float len;
char prev, curr;
char done[10];
int flag,count;
len=strlen(arr);
for(int i=0;i<len;i++)
{
count=0;
prev=arr[i];
flag=0;
for(int j=0;j<len;j++)
{
curr=arr[j];
if(prev==curr)
{
count++;
}
}
prob[i]=(int)count/len;
LJIET Data Compression VIII - IT
Page 19
}
sort();
}
void Huffman:: sort()
{
int done[20], flag, count;
char temp, curr;
float currProb, tempInt;
int len, currInt;
len=strlen(arr);
count=0;
for(int i=0;i<len;i++)
{
flag=0;
curr=arr[i];
currProb=prob[i];
currInt=curr;
for(int j=0;j<count;j++)
{
if(currInt==done[j])
{
flag=1;
}
}
if(flag==1)
{
}
else
{
sortArr[count]=curr;
sortProb[count]=currProb;
done[count]=currInt;
count++;
}
LJIET Data Compression VIII - IT
Page 20
}
for(int m=1;m<count;m++)
{
for(int n=m;n<count;n++)
{
if(sortProb[m]<sortProb[n])
{
tempInt=sortProb[n];
sortProb[n]=sortProb[m];
sortProb[m]=tempInt;
temp=sortArr[n];
sortArr[n]=sortArr[m];
sortArr[m]=temp;
}
}
}
for(i=0;i<count;i++)
{
cout<<"\nChar : "<<sortArr[i]<<" Prob : "<<sortProb[i];
sum=sum+(sortProb[i]*log(sortProb[i]));
}
sum=(-1)*sum;
nodeCount=count;
compress();
}
void Huffman :: compress()
{
int i,j,k,l,m,n,flag=0;
float tempProb, temp;
int tempCount, count, tempVal, tempInt;
node *start=new node;
node *tempNode=new node;
tempNode=NULL;
tempCount=nodeCount;
for(i=0;i<tempCount-1;i++)
{
node *node1=new node;
node1->prob=sortProb[tempCount-i-1];
node1->value=sortArr[tempCount-i-1];
node1->lptr=NULL;
LJIET Data Compression VIII - IT
Page 21
node1->rptr=NULL;
if(node1->value==start->value && node1->prob==start->prob)
{
node1=start;
flag=1;
}
else if(node1->value==tempNode->value && node1->prob==tempNode-
>prob)
{
node1=tempNode;
flag=1;
}
node *node2=new node;
node2->prob=sortProb[tempCount-i-2];
node2->value=sortArr[tempCount-i-2];
node2->lptr=NULL;
node2->rptr=NULL;
if(node2->value==start->value && node1->prob == start->prob)
{
node2=start;
flag=1;
}
else if(node2->value==tempNode->value && node1->prob==tempNode-
>prob)
{
node2=tempNode;
flag=1;
}
if(flag==0)
{
tempNode=start;
}
else
flag=0;
node *node3=new node;
if(nodeCount==2)
{
node3->prob=sortProb[nodeCount-1]+sortProb[nodeCount-2];
node3->lptr=tempNode;
LJIET Data Compression VIII - IT
Page 22
node3->rptr=start;
}
else
{
node3->prob=node1->prob+node2->prob;
node3->lptr=node1;
node3->rptr=node2;
}
node3->value=(int)'$';
start=node3;
nodeCount--;
tempProb=node3->prob;
tempVal=node3->value;
sortProb[nodeCount-1]=tempProb;
sortArr[nodeCount-1]=tempVal;
count=nodeCount;
for(int m=0;m<count;m++)
{
for(int n=m;n<count;n++)
{
if(sortProb[m]<sortProb[n])
{
temp=sortProb[m];
sortProb[m]=sortProb[n];
sortProb[n]=temp;
tempInt=sortArr[m];
sortArr[m]=sortArr[n];
sortArr[n]=tempInt;
}
}
}
cout<<"\n\n";
for(l=0;l<count;l++)
{
cout<<"\nChar : "<<sortArr[l]<<" Prob : "<<sortProb[l];
}
}
cout<<"\n\n";
inorder(start);
LJIET Data Compression VIII - IT
Page 23
}
void main()
{
Huffman h;
cout<<"\nANTROPY :"<<sum <<" bits/second" ;
getch();
}
Output :
Enter any string : aaabbcdeee
Char : a Prob : 0.3
Char : e Prob : 0.3
Char : b Prob : 0.2
Char : d Prob : 0.1
Char : c Prob : 0.1
Char : a Prob : 0.3
Char : e Prob : 0.3
Char : b Prob : 0.2
Char : $ Prob : 0.2
Char : $ Prob : 0.4
Char : e Prob : 0.3
Char : a Prob : 0.3
Char : $ Prob : 0.6
Char : $ Prob : 0.4
Char : $ Prob : 1.0
e : 00
a : 01
b : 10
d : 110
c : 111
ANTROPY :1.504788 bits/second
LJIET Data Compression VIII - IT
Page 24
Advantages:-
Huffman coding could perform effective data compression by reading the amount
of redundancy in the coding of symbols.
It generates an optimal code.
Symbols that occur more frequently have shorter code words.
Disadvantages:-
Huffman codes have to be an integral number of bits long and this can sometimes
be a problem. If the probability of a character is 1/3. For example the optimum
number of bits to code that character is around 1.8 bits.
Non optimal coding becomes a noticeable problem when the probability of a
character is very high.
If a statistical method could assign 90 percent probability to a given character ,
the optimal code size could be 0.15 bits , but the Huffman coding system would
probably assign a 1 bit code to the symbol, which is six times larger than
necessary.
Applications:
Lossless image compression: - A simple application of Huffman coding to image
compression would be to generate a Huffman code for the set of value that may
take.
Text compression: - In text, we have a discrete alphabet that, in a given class has
relatively stationary probabilities.
Auto compression: - another class of data that is very suitable for compression is
CD-quality audio data.
LJIET Data Compression VIII - IT
Page 25
PRACTICAL 4
AIM: Study of Shannon-Fano compression Algorithm
Description:
The first well-known method for effectively coding symbols is now known as Shannon-
Fano coding. Claude Shannon at Bell Labs and R.M. Fano at MIT developed this method
nearly simultaneously.
Shannon Fano coding depends on simple knowing the probability of each symbols
occurrences in the message. Given the probability table of codes could be constructs that
has several important properties are as follow:-
1. Different code have different no of bits.
2. Codes for symbols with high probability have fewer bits and codes for the
symbols with low probabilities have more bits.
According to occurrence of symbols in the message; they have different size if
occurrence is more then less bits and if occurrence is less then more bits.
3. Though the codes are of different bit length, they can be uniquely decodable.
Developing codes that very in length according to the probability of the symbol
they are encoding makes data compression possible and for variable length codes
arranging codes as a binary tree.
Shannon Fano algorithm:-
This model is based on probability model. The decription of this algorithm is
given below:-
Step1:- for given list of symbols develop corresponding list of probability or
frequency so that each symbol relative frequency of occurrence is known.
Step2:- Sort the list of symbols according to frequency of occurrence with most
occurring symbol at top and least occurring at bottom.
Step3:- Divide list into two parts with the total frequency count of upper half
being as closed to total of bottom half.
Step4:- The upper half of list is assign by 0 and lower half by 1 this means that
code in first half start with 0 & rates with 1.
Step5:- Recursively apply step 3 & 4 to each of two half subdividing groups and
adding bits to code until each symbol has become corresponding code leaf for tree.
For the probability distribution already provided, Figure Error! No text of specified
style in document.-1 illustrates the steps involved in coding the characters.
LJIET Data Compression VIII - IT
Page 26
Table Error! No text of specified style in document.-1analyses the efficiency of Shannon-
Fano algorithm. As mentioned earlier, the concept of entropy is used to measure the
efficiency of any given algorithm. As against a total information content of 233 bits, 237
bits is being used for carrying the message. This translates into an overhead of 1.4%
{(237-233)/23}. The basic reason for this overhead is the approximation involved in
rounding off the information content of a character to the nearest integer. For example,
although ‘E’ has an information content of 1.69, 2bits are used to encode it. This alone
accounts for the 9.61bit overhead (62 – 52.39). Although rounding off may offer some
positive overhead also (see row entries of ‘C’), the end result is using up more bits that is
required to code the message.
Figure Error! No text of specified style in document.-1: Steps of Shannon-Fano theorem
Table Error! No text of specified style in document.-1: Analysis of Shannon-Fano algorithm
Characte
r
Number of
occurrences
in a message
Probability
of
occurrence
Entropy
(- log2P)
Total
Information
content
(occurrence
* entropy)
Bits used
to code the
message
Total
number
of bits
used
A 28 0.28 1.83 51.24 2 (01) 56
B 6 0.06 4.05 24.3 4 (1111) 24
C 11 0.11 3.18 34.98 3 (110) 33
D 17 0.17 2.55 43.35 2 (10) 34
E 31 0.31 1.69 52.39 2 (00) 62
F 7 0.07 3.83 26.81 4 (1110) 28
233.07 237
Step 1
Sort in
descending
order
Step 2
Divide into
two parts
Step 3
0*
1*
Step 4
00
1*
01
Step 5
00
11*
01
10
Step 6
00
111*
01
10
110
Step 7
E (31)
A (28)
D (17)
C (11)
F (7)
B (6)
E (31)
A (28)
D (17)
C (11)
F (7)
B (6)
E (31)
A (28)
D (17)
C (11)
F (7)
B (6)
E (31)
A (28)
D (17)
C (11)
F (7)
B (6)
E (31)
A (28)
D (17)
C (11)
F (7)
B (6)
E (31)
A (28)
D (17)
C (11)
F (7)
B (6)
E (31)
A (28)
D (17)
C (11)
F (7)
B (6)
00
1111
01
10
110
1110
LJIET Data Compression VIII - IT
Page 27
Source code :
#include<stdio.h>
#include<iostream.h>
#include<conio.h>
#include<string.h>
#include<process.h>
class ShannonFano
{
char arr[20], sortArr[20];
float prob[20], sortProb[20];
int comp[20][10], counter[20];
public:
ShannonFano();
void getData();
void probability();
void sort();
void compress();
int split(int, int, int);
void print();
};
ShannonFano :: ShannonFano()
{
int m, n;
for(m=0;m<20;m++)
{
counter[m]=0;
sortArr[m]=0;
}
getData();
}
void ShannonFano :: getData()
{
int len,i=0;
char ch;
FILE *fp;
clrscr();
LJIET Data Compression VIII - IT
Page 28
if((fp=fopen("c:\\dc.txt","r"))==NULL)
{
printf("CAN NOTOPEN THE FILE ");
getch();
exit(0);
}
while(!feof(fp))
{
ch=fgetc(fp);
arr[i]=ch;
i++;
}
len=strlen(arr);
probability();
}
void ShannonFano :: probability()
{
float len;
char prev, curr;
char done[10];
int flag,count;
len=strlen(arr);
for(int i=0;i<len;i++)
{
count=0;
prev=arr[i];
flag=0;
for(int j=0;j<len;j++)
{
curr=arr[j];
if(prev==curr)
{
count++;
}
}
prob[i]=(int)count/len;
LJIET Data Compression VIII - IT
Page 29
sort();
}
void ShannonFano :: sort()
{
int done[20], flag, count;
char temp, curr;
float currProb, tempInt;
int len, currInt;
len=strlen(arr);
count=0;
for(int i=0;i<len;i++)
{
flag=0;
curr=arr[i];
currProb=prob[i]
currInt=curr;
for(int j=0;j<count;j++)
{
if(currInt==done[j])
{
flag=1;
}
}
if(flag==1)
{
}
else
{
sortArr[count]=curr;
sortProb[count]=currProb;
done[count]=currInt;
count++;
}
}
for(int m=0;m<count;m++)
{
for(int n=m;n<count;n++)
LJIET Data Compression VIII - IT
Page 30
{
if(sortProb[m]<sortProb[n])
{
tempInt=sortProb[m];
sortProb[m]=sortProb[n];
sortProb[n]=tempInt;
temp=sortArr[m];
sortArr[m]=sortArr[n];
sortArr[n]=temp;
}
}
}
for(i=0;i<count;i++)
{
cout<<"\nChar : "<<sortArr[i]<<" Prob : "<<sortProb[i];
}
compress();
}
void ShannonFano :: compress()
{
float prev, curr;
int mid, len, done, flag1, flag2;
float sum1, sum2;
int cut,cut1;
flag1=1;
flag2=1;
len=strlen(sortArr);
prev=1;
sum1=0;
done=0;
sum2=0;
for(int i=0;i<len;i++)
{
sum1+=sortProb[i];
sum2=1-sum1;
if(sum1>sum2)
LJIET Data Compression VIII - IT
Page 31
curr=sum1-sum2;
else
curr=sum2-sum1;
if(curr<prev)
{
prev=curr;
mid=i;
}
}
split(0,mid,0);
split(mid+1,len-1,1);
print();
}
int ShannonFano :: split(int lower, int upper, int flag)
{
float sum1, sum2, curr, prev, total;
int low, up, mid;
if(upper-lower==0)
{
comp[upper][counter[upper]]=flag;
counter[upper]++;
}
else if(upper-lower==1)
{
comp[upper][counter[upper]]=flag;
counter[upper]++;
comp[upper][counter[upper]]=1;
counter[upper]++;
comp[lower][counter[lower]]=flag;
counter[lower]++;
comp[lower][counter[lower]]=0;
counter[lower]++;
}
else
{
LJIET Data Compression VIII - IT
Page 32
for(int i=lower;i<=upper;i++)
{
comp[i][counter[i]]=flag;
counter[i]++;
}
sum1=0;
sum2=0;
low=lower;
up=upper;
prev=1;
total=0;
for(int j=lower;j<=upper;j++)
{
total+=sortProb[j];
}
for(j=lower;j<=upper;j++)
{
sum1+=sortProb[j];
sum2=total-sum1;
if(sum1>sum2)
curr=sum1-sum2;
else
curr=sum2-sum1;
if(curr<prev)
{
prev=curr;
mid=j;
}
}
split(low,mid,0);
split(mid+1,up,1);
}
}
void ShannonFano :: print()
{
int i, j, k, len, sLen;
int x, y, z;
len=strlen(sortArr);
LJIET Data Compression VIII - IT
Page 33
cout<<"\n";
for(i=0;i<len;i++)
{
cout<<"\n"<<(char)sortArr[i]<<" : ";
for(j=0;j<counter[i];j++)
{
cout<<comp[i][j];
}
}
cout<<"\n\nThe compressed string is : ";
sLen=strlen(arr);
for(x=0;x<sLen;x++)
{
for(y=0;y<len;y++)
{
if(sortArr[y]==(int)arr[x])
{
for(z=0;z<counter[y];z++)
{
cout<<comp[y][z];
}
}
}
}
}
void main()
{
ShannonFano s;
getch();
}
LJIET Data Compression VIII - IT
Page 34
File content :
citc_
Output :
Char : c Prob : 0.4
Char : i Prob : 0.2
Char : t Prob : 0.2
Char : _ Prob : 0.2
c : 0
i : 100
t : 101
_ : 11
The compressed string is : 0100101011
Advantages
Less compression computing complexity as compare to the Huffman coding.
Require less bits then ASCII coding.
Disadvantages
Compression is not higher than other algorithm like Huffman coding, arithmetic
coding and dictionary based coding.
We have to transfer all the modeling information from encoding side to decoding
side.
Applications of Shannon Fano algorithm:-
Design and Implement software video,optical and mechanical configuration for
acquisition and retrieval of electronic id card.
Design patented Shannon Fano compression techniques to reduce photograph to
500 bytes and signature to 200 bytes.
Used in Cambridge computer associates Inc.,Cambridge, m.a. Senior consultant.
LJIET Data Compression VIII - IT
Page 35
PRACTICAL 5
AIM: Write a program to implement arithmetic coding.
#include<iostream.h>
#include<stdio.h>
#include<conio.h>
#include<string.h>
#include<math.h>
int count,a,i,j,k=0,l,flag=0,m=0,freq[25],sum=0,total,temp;
int q,sum1=0,sum2=0,min,diff[25]={0};
double start[10];
double r1[10],r2[10];
double low=0,high=1;
double info_cont[25],p[25];
char *str,ch[26],temp1,range[25][25];
void main()
{
void cha();
void sortf();
void prob_range();
void low_high();
void decoding();
clrscr();
cout<<"\n Enter the string......";
gets(str);
cha();
sortf();
prob_range();
low_high();
decoding();
getch();
}
void cha()
{
ch[k]=str[0];
for(i=0;str[i]!='\0';i++)
{
for(j=0;j<=k;j++)
{
if(ch[j]==str[i])
{
flag=1;
break;
LJIET Data Compression VIII - IT
Page 36
}
else
{
flag=0;
}
}
if(flag==0)
{
k++;
ch[k]=str[i];
}
}
cout<<"\n String2...";
for(i=0;i<=k;i++)
{
cout<<ch[i];
}
getch();
}
void sortf()
{
for(i=0;ch[i]!='\0';i++)
{
a=ch[i];
count=0;
for(j=0;str[j]!='\0';j++)
{
if(a==str[j])
{
count++;
}
}
freq[m]=count;
sum+=count;
m++;
}
cout<<"\nTotal char...."<<sum;
cout<<"\nchar freq ";
for(i=0;i<=k;i++)
{
cout<<"\n"<<ch[i]<<"\t"<<freq[i];
}
cout<<"\n****************************************************";
for(i=0;i<=k;i++)
LJIET Data Compression VIII - IT
Page 37
{
for(j=i+1;j<=k;j++)
{
if(ch[i]>ch[j])
{
temp=ch[i];
ch[i]=ch[j];
ch[j]=temp;
temp1=freq[i];
freq[i]=freq[j];
freq[j]=temp1;
}
}
}
cout<<"\n sorting the freq......";
cout<<"\n char \t freq";
for(i=0;i<=k;i++)
{
cout<<"\n"<<ch[i]<<"\t"<<freq[i];
}
cout<<"\n*****************************************************";
getch();
}
void prob_range()
{
cout<<"\n Total Character....."<<sum;
for(i=0;i<=k;i++)
{
p[i]=(double)freq[i]/sum;
}
r1[0]=0;
r2[0]=p[0];
for(i=1;i<=k;i++)
{
r1[i]=r2[i-1];
r2[i]=r1[i]+p[i];
}
getch();
cout<<"\n Char Freq Prob. Range";
for(i=0;i<=k;i++)
cout<<"\n"<<ch[i]<<"\t"<<freq[i]<<"\t"<<p[i]<<"\t"<<
r1[i]<<"<=r<"<<r2[i];
}
void low_high()
{
double l_range(char);
LJIET Data Compression VIII - IT
Page 38
double h_range(char);
double range,high_range[100],low_range[100],jj;
for(i=0;str[i]!='\0';i++)
{
range=high-low;
for(j=0;ch[j]!='\0';j++)
{
if(ch[j]==str[i])
break;
}
low_range[i]=low+range*r1[j];
high_range[i]=low+range*r2[j];
high=high_range[i];
low=low_range[i];
}
getch();
cout<<"\n CHAR LOW HIGH";
for(i=0;str[i]!='\0';i++)
{
printf("\n%c\t%.12lf\t\t%.12lf",str[i],
low_range[i],high_range[i]);
}
printf("\n \n The Encoded Number.........%.12lf",low);
}
double l_range(char c)
{
double l;
for(i=0;ch[i]!='\0';i++)
{
if(ch[i]==c)
{
l=r1[i];
break;
}
}
return l;
}
double h_range(char c)
{
double h;
for(i=0;i<=k;i++)
{
if(ch[i]==c)
LJIET Data Compression VIII - IT
Page 39
{
h=r1[i];
break;
}
}
return h;
}
void decoding()
{
double encode_no;
double no;
encode_no=low;
printf("\n The Encode No...%.12lf",encode_no);
printf("\n\nChar Encoded No Low High");
getch();
for(i=0;str[i]!='\0';i++)
{
for(j=0;ch[j]!='\0';j++)
{
if(str[i]==ch[j])
{
printf("\n%c\t%.12lf\t\t%.2lf\t%.2lf",
ch[j],encode_no,r1[j],r2[j]);
encode_no=(encode_no-r1[j])/(r2[j]-r1[j]);
}
}
}
}
LJIET Data Compression VIII - IT
Page 40
PRACTICAL 6
Aim: Write a program to implement lZ77 algorithm.
#include<stdio.h>
#include<conio.h>
#include<string.h>
#include<process.h>
void main()
{
char Input[80],Window[64],string[100];
int WindowIndex,FileIndex,flag;
int stack[10],top,i,offset,j,Size,size;
printf("A PROGRAM TO IMPLEMENT LZ77 ALGORITHM.\n");
printf("ENTER THE STRING TO ENCODE (STRING SHOULDN'T EXCEED
82 CHARACTERS)\n");
for(i=0;i<82;i++)
{
scanf("%c",&Input[i]);
if(Input[i]=='\n')
{
Input[i]='\0';
break;
}
}
Size=i;
WindowIndex=0;
FileIndex=0;
size=0;
for(;;)
{
offset=0;
top=0;
for(i=0;;i++)
{
string[i]=Input[FileIndex];
size++;
if(size==Size)
{
printf("%d %d %c",
offset+2i,i,Input[FileIndex]);
goto end;
}
string[i+1]='\0';
LJIET Data Compression VIII - IT
Page 41
flag=0;
if(i==0)
{
for(j=0;j<WindowIndex;j++)
if(Window[j]==string[0])
{
stack[top]=j;
top++;
flag=1;
offset=j;
}
}
else
{
int stacktmp[10],toptmp=0;
for(j=0;j<top;j++)
{
if(Window[stack[j]+1]==string[i])
{
flag=1;
stacktmp[toptmp]=stack[j]+1;
toptmp++;
offset=stack[j]+1;
}
}
for(j=0;j<toptmp;j++)
stack[j]=stacktmp[j];
top=toptmp;
}
if(flag==0)
{
printf("%d %d %c\n",offset+2-i,i,Input[FileIndex]);
Window[WindowIndex]=Input[FileIndex];
FileIndex++;
WindowIndex++;
break;
}
else
{
Window[WindowIndex]=Input[FileIndex];
WindowIndex++;
FileIndex++;
}
}
LJIET Data Compression VIII - IT
Page 42
}
end:
getch();
}
LJIET Data Compression VIII - IT
Page 43
Output
A PROGRAM TO IMPLEMENT LZ77 ALGORITHM.
ENTER THE STRING TO ENCODE (STRING SHOULDN'T EXCEED 82
CHARACTERS)
abracadabra
2 0 a
2 0 b
2 0 r
1 1 c
4 1 d
1 3 a
LJIET Data Compression VIII - IT
Page 44
PRACTICAL 7
Aim: Write a program to implement lZ78 algorithm.
/*You have to enter the string from a to c only*/
#include<stdio.h>
#include<conio.h>
#include<string.h>
#define MAX 3
float tag_generation(char[]);
void tag_decoding(float,int);
struct codes
{
char symbol;
float prob;
float lo_limit;
float up_limit;
};
struct codes c[MAX]={{'a',0.8,0.0,0.8},{'b',0.02,0.8,0.82},{'c',0.18,0.82,1.0}};
void main()
{
char str[50];
float tag;
clrscr();
printf("Enter any string:");
scanf("%s",str);
printf("\n");
tag=tag_generation(str);
printf("\n\nArithmetic Coding\n\n");
printf("\nTag generated:%f",tag);
printf("\n\nDeciphering the tag\n\n");
tag_decoding(tag,strlen(str));
getch();
}
float tag_generation(char *str)
{
int i=0,j;
float lower=0.0,upper=1.0,l;
for(i=0;str[i]!='\0';i++)
{
for(j=0;j<MAX;j++)
{
if(str[i]==c[j].symbol)
LJIET Data Compression VIII - IT
Page 45
{
l=lower+((upper-lower)*c[j-1].up_limit);
upper=lower+((upper-lower)*c[j].up_limit);
lower=l;
printf("%f\t%f\n",lower,upper);
}
}
}
return((lower+upper)/2);
}
void tag_decoding(float tag,int cnt)
{
int i=0,j;
float lower=0.0,upper=1.0,l;
while(cnt!=0)
{
for(i=0;i<=MAX;i++)
{
if(tag>(lower+(upper-lower)*c[i].lo_limit)&&tag<(lower+(upper-
lower)*c[i].up_limit))
{
printf("%c\t",c[i].symbol);
l=lower+(upper-lower)*c[i-1].up_limit;
upper=lower+(upper-lower)*c[i].up_limit;
lower=l;
printf("%f\t%f\t%f\n",upper,lower,upper-lower);
cnt--;
continue;
}
}
}
}
}
LJIET Data Compression VIII - IT
Page 46
PRACTICAL 8
Aim: Write a program to implement lzss algorithm.
#include<stdio.h>
#include<conio.h>
#include<string.h>
#include<process.h>
void main()
{
char Input[80],Window[64],string[100];
int WindowIndex,FileIndex,flag;
int stack[10],top,i,offset,j,Size,size;
clrscr();
printf("A PROGRAM TO IMPLEMENT LZSS ALGORITHM.\n");
printf("ENTER THE STRING TO ENCODE (STRING SHOULDN'T EXCEED
82 CHARACTERS)\n");
for(i=0;i<82;i++)
{
scanf("%c",&Input[i]);
if(Input[i]=='\n')
{
Input[i]='\0';
break;
}
}
printf("\n\nBIT OFFSET COUNT NEXT_CHAR\n");
Size=i;
WindowIndex=0;
FileIndex=0;
size=0;
for(;;)
{
offset=0;
top=0;
for(i=0;;i++)
{
string[i]=Input[FileIndex];
size++;
if(size==Size)
{
printf(" 1 %d %d %c\n",offset+2-
i,i,Input[FileIndex]);
goto end;
}
LJIET Data Compression VIII - IT
Page 47
string[i+1]='\0';
flag=0;
if(i==0)
{
for(j=0;j<WindowIndex;j++)
if(Window[j]==string[0])
{
stack[top]=j;
top++;
flag=1;
offset=j;
}
}
else
{
int stacktmp[10],toptmp=0;
for(j=0;j<top;j++)
{
if(Window[stack[j]+1]==string[i])
{
flag=1;
stacktmp[toptmp]=stack[j]+1;
toptmp++;
offset=stack[j]+1;
}
}
for(j=0;j<toptmp;j++)
stack[j]=stacktmp[j];
top=toptmp;
}
if(flag==0)
{
if(i==0)
{
printf(" 0 %23c\n",Input[FileIndex]);
Window[WindowIndex++]=
Input[FileIndex++];
break;
}
else
{
printf(" 1 %d %d %c\n",
offset+2-i,i,Input[FileIndex]);
Window[WindowIndex]=Input[FileIndex];
FileIndex++;
WindowIndex++;
LJIET Data Compression VIII - IT
Page 48
break;
}
}
else
{
Window[WindowIndex]=Input[FileIndex];
WindowIndex++;
FileIndex++;
}
}
}
end:
getch();
}
LJIET Data Compression VIII - IT
Page 49
Output
A PROGRAM TO IMPLEMENT LZSS ALGORITHM.
ENTER THE STRING TO ENCODE (STRING SHOULDN'T EXCEED 82
CHARACTERS)
abracadabra
BIT OFFSET COUNT NEXT_CHAR
0 a
0 b
0 r
1 1 1 c
1 4 1 d
1 1 3 a