Huffman Codes

41
1 Huffman Codes Using Binary Files

description

Huffman Codes. Using Binary Files. Getting Started. Last class we extended a program to create a Huffman code and permit the user to encode and decode messages. We will use that program as our starting point today: - PowerPoint PPT Presentation

Transcript of Huffman Codes

Page 1: Huffman Codes

1

Huffman Codes

Using Binary Files

Page 2: Huffman Codes

2

Getting Started

Last class we extended a program to create a Huffman code and permit the user to encode and decode messages.

We will use that program as our starting point today:

http://www.cse.usf.edu/~turnerr/Data_Structures/Downloads/2011_04_13_Huffman_Codes_with_Binary_IO/

File Huffman_Code_with_Associative_Map.zip

Download, extract, built, and run.

Page 3: Huffman Codes

3

Program in Action

Widen window to 100.

Page 4: Huffman Codes

4

Binary Output

Huffman codes are useful in real life only when we output the coded message as binary.

Let's modify do_encode to output a file. Start with a text file.

ASCII 1's and 0's Modify to write a binary file.

Page 5: Huffman Codes

5

main.cpp

Add at top of main.cpp:

#include <fstream>

Modified version of do_encode: http://www.cse.usf.edu/~turnerr/Data_Structures/

Downloads/2011_04_13_Huffman_Codes_with_Binary_IO/do_encode.cpp.txt

Page 6: Huffman Codes

6

Modifications to do_encode()void do_encode(void)

{

string msg;

string output_filename;

ofstream outfile;

string junk;

while (!outfile.good())

{

cout << "File name for output? ";

cin >> output_filename;

getline(cin,junk); // Skip newline char

outfile.open(output_filename.c_str());

if (!outfile.good())

{

cout << "Failed to open output file\n";

cout << "Please try again\n";

}

}

Page 7: Huffman Codes

7

Modifications to do_encode()

cout << "\n\nEnter message to encode\n";

getline(cin, msg);

for (size_t i = 0; i < msg.length(); ++i)

{

char next_char = tolower(msg[i]);

string code = huffman_tree.Encode_Char(next_char);

cout << code;

outfile << code;

}

cout << endl << endl;

outfile << endl << endl;

outfile.close();

cout << "File " << output_filename << " written\n";

}

Page 8: Huffman Codes

8

Clean Up Output

Comment out statements that output the tree and the code.

In main():int main(void)

{

cout << "This is the Huffman Code Program" << endl;

build_huffman_tree();

//huffman_tree.Display_List();

Page 9: Huffman Codes

9

In Huffman_Tree.cpp

void Huffman_Tree::Make_Decode_Tree(void)

{

node_list.sort();

//cout << "\nSorted list:\n";

//Display_List();

...

//cout << endl << "The Huffman Tree" << endl;

//Display_Decode_Tree(&decode_tree_root, 0);

//cout << endl << "The Code: " << endl;

//Display_Code(&decode_tree_root, "");

}

Page 10: Huffman Codes

10

Program in Action

Examine c:\out.txt

Page 11: Huffman Codes

The Output File

11

Page 12: Huffman Codes

12

Invalid Characters

What should we do with characters that are not in the code?

Encode_Char() returns a zero length string.

Detect the error in do_encode(). Tell user about the error. Skip the invalid character in output.

Page 13: Huffman Codes

13

main.cpp

In do_encode()for (size_t i = 0; i < msg.length(); ++i)

{

char next_char = tolower(msg[i]);

string code = huffman_tree.Encode_Char(next_char);

if (code.size() == 0)

{

cout << endl << "Invalid character in input to do_encode: "

<< next_char << endl;

continue;

}

cout << code;

outfile << code << " ";

}

Page 14: Huffman Codes

14

Program Running

Page 15: Huffman Codes

15

The Output File

Page 16: Huffman Codes

16

Binary File I/O

Issues with binary files. Hardware architecture dependencies. Code is typically not portable. Output is by byte, not by bit For Huffman coding we need variable

length bit strings. Must know number of bits.

Encapsulate code to do binary file I/O in classes.

Provide relatively simple interface to the rest of the program.

Page 17: Huffman Codes

17

Binary Output File Class

Bit Count

BitsClient Classes

Binary Output File Class

Buffer

Page 18: Huffman Codes

18

Binary Input File Class

Bit Count

BitsClient Classes

Binary Input File Class

Buffer

Page 19: Huffman Codes

19

Binary File Classes

Binary_File

is_open

buffer

next_bit_position

filename

BUFFER_SIZE

FIRST_BIT_POSITION

+ Is_Open

Binary_Output_File

-fstream

+ Output_Bit_String

+ Close

- Write_Buffer

Binary_Input_File

-fstream

+ Get_Next_Bit

+ Close

-Read_Buffer

Page 20: Huffman Codes

20

Binary File I/O

Download http://www.cse.usf.edu/~turnerr/Data_Structures/

Downloads/2011_04_13_Binary_File_IO/ File Binary_File_IO_Classes.zip

Page 21: Huffman Codes

21

Binary File IO Classes

Copy into project folder and add to project.

Page 22: Huffman Codes

22

Add Binary File IO Files to Project

Build project.

Page 23: Huffman Codes

23

Binary_File.h#pragma once

#include <string>

using std::string;

class Binary_File

{

public:

Binary_File(const string& Filename);

virtual void Close() = 0;

bool Is_Open() const {return is_open;};

protected:

static const int BUFFER_SIZE = 1024; // Size in bytes

static const int FIRST_BIT_POSITION = 8*sizeof(size_t);

union Buffer

{

char bits[BUFFER_SIZE];

size_t bit_count;

};

void Reset_Buffer(void);

const string filename;

bool is_open;

Buffer buffer;

size_t next_bit_position;

};

Page 24: Huffman Codes

24

Binary_File.cpp

#include "Binary_File.h"

Binary_File::Binary_File(const string& Filename) :

is_open(false), filename(Filename)

{

Reset_Buffer();

}

void Binary_File::Reset_Buffer(void)

{

for (int i = 0; i < BUFFER_SIZE; ++i)

{

buffer.bits[i] = 0;

}

next_bit_position = FIRST_BIT_POSITION;

}

Page 25: Huffman Codes

25

Binary_Output_File.h

#pragma once

#include <iostream>

#include <fstream>

#include <string>

#include "Binary_File.h"

using std::string;

class Binary_Output_File : public Binary_File

{

public:

Binary_Output_File(const string& filename);

void Output(const string& bit_string);

void Close();

private:

std::fstream outfile;

void Write_Buffer();

};

Page 26: Huffman Codes

26

Binary_Output_File.cpp

#include <cassert>

#include <cmath>

#include "Binary_Output_File.h"

using namespace std;

Binary_Output_File::Binary_Output_File(const string& filename) : Binary_File(filename)

{

outfile.open(filename.c_str(), ios::out | ios::binary );

if (outfile.fail())

{

string err_msg("Error opening output file ");

err_msg += filename;

throw err_msg;

}

Reset_Buffer();

is_open = true;

}

Page 27: Huffman Codes

27

Binary_Output_File.cpp

void Binary_Output_File::Write_Buffer()

{

assert (is_open);

if (next_bit_position == FIRST_BIT_POSITION)

{

return;

}

buffer.bit_count = next_bit_position - FIRST_BIT_POSITION;

size_t nr_bytes = (size_t) ceil(next_bit_position / 8.0);

outfile.write( buffer.bits, nr_bytes);

Reset_Buffer();

}

Page 28: Huffman Codes

Binary_Output_File.cppvoid Binary_Output_File::Output(const string& bit_string)

{

assert(is_open);

for (size_t i = 0; i < bit_string.size(); ++i)

{

if (bit_string[i] == '1')

{

size_t byte_position = next_bit_position / 8;

size_t bit_position_within_byte = next_bit_position % 8;

buffer.bits[byte_position] |= (0x80 >> bit_position_within_byte);

}

else

{

assert(bit_string[i] == '0');

}

++next_bit_position;

if (next_bit_position == BUFFER_SIZE*8)

{

Write_Buffer();

}

}

}

Page 29: Huffman Codes

29

Binary_Output_File.cpp

void Binary_Output_File::Close()

{

Write_Buffer();

outfile.close();

is_open = false;

}

Page 30: Huffman Codes

30

Using Binary File IO

Now let's modify do_encode() to write a binary file.

Add at top of main.cpp:#include "Binary_Output_File.h"

Page 31: Huffman Codes

31

do_encode()

void do_encode(void)

{

string msg;

string output_filename;

Binary_Output_File* outfile;

string junk;

while (true)

{

cout << "File name for output? ";

cin >> output_filename;

getline(cin, junk); // Skip newline char

try

{

outfile = new Binary_Output_File(output_filename);

break;

}

catch (const string& msg)

{

cout << msg << endl;

}

}

Page 32: Huffman Codes

32

do_encode()

cout << "Enter message to encode\n";

getline(cin, msg);

for (size_t i = 0; i < msg.length(); ++i)

{

char next_char = tolower(msg[i]);

string code = huffman_tree.Encode_Char(next_char); if (code.size() == 0)

{

cout << endl << "Invalid character in input to do_encode: "

<< next_char << endl;

continue;

}

cout << code;

outfile->Output(code);

}

Page 33: Huffman Codes

33

do_encode()

cout << endl << endl;

outfile->Close();

delete(outfile);

cout << "File " << output_filename << " written\n";

}

Page 34: Huffman Codes

34

Some Test Data

Page 35: Huffman Codes

35

Program in Action

Page 36: Huffman Codes

36

c:\test.dat

Look at the output file in Visual Studio File > Open > File

Bit Count = 16 0000 0001 0010 0011

Page 37: Huffman Codes

37

Binary Input

Now let's modify do_decode() to read a binary input file rather than reading 1's and 0's from the keyboard.

Add at top of main.cpp:#include "Binary_Input_File.h"

http://www.cse.usf.edu/~turnerr/Data_Structures/Downloads/2011_04_13_Huffman_Codes_with_Binary_IO/do_decode.cpp.txt

Page 38: Huffman Codes

38

do_decode()void do_decode(void)

{

string msg;

string input_filename;

Binary_Input_File* infile;

string junk;

while (true)

{

cout << "File name for input? ";

cin >> input_filename;

getline(cin, junk); // Skip newline char

try

{

infile = new Binary_Input_File(input_filename);

break;

}

catch (const string& msg)

{

cout << msg << endl;

}

}

Page 39: Huffman Codes

39

do_decode()

string coded_message = "";

string original_message;

while (infile->Is_Open())

{

int next_bit = infile->Get_Next_Bit();

if (next_bit < 0) break;

if (next_bit == 0)

{

coded_message += "0";

}

else

{

coded_message += "1";

}

}

original_message = huffman_tree.Decode_Msg(coded_message);

cout << "Original message: " << original_message << endl;

cout << endl << endl;

}

Page 40: Huffman Codes

40

Reading a Binary File

Page 41: Huffman Codes

41

Another Example