How GZIP works... in 10 minutes

15
How GZIP Compression Works Raul Fraile …in 10 minutes

description

Slides of the talk at the deSymfonyDay unconference

Transcript of How GZIP works... in 10 minutes

Page 1: How GZIP works... in 10 minutes

How GZIP Compression Works Raul Fraile …in 10 minutes

Page 2: How GZIP works... in 10 minutes

About me

• PHP/Symfony2 developer at

• PHP 5.3 Zend Certified Engineer

• Symfony Certified Developer

• BS in Computer Science. Ms(Res) student in Computing Technologies.

• Open source: LadybugPHP

Page 3: How GZIP works... in 10 minutes

What is GZIP?

• GZIP is a lossless compression method, we can recover the original data once decompressed.

• It has become the de-facto lossless compression method for compressing textual data in websites.

Page 4: How GZIP works... in 10 minutes

What is GZIP?

Web server

GET index.html Accept-Encoding: gzip

Page 5: How GZIP works... in 10 minutes

How it works?

• It is based on the DEFLATE algorithm, which is a combination of LZ77 and Huffman coding.

• First, the LZ77 algorithm replaces repeated occurrences of data with references.

• Second, Huffman coding assigns shorter codes to more frequent “characters”.

Page 6: How GZIP works... in 10 minutes

How it works?

This file is huge! That's because the file is not compressed

<33, 9>

LZ77

Page 7: How GZIP works... in 10 minutes

How it works?

“compressed”

Huffman coding

c: 1 o: 1 m: 1 p: 1

r: 1 e: 2 s: 2 d: 1

01100011 01101111 01101101 01110000 01110010 01100101 01110011 01110011 01100101 01100100

1100 011 010 000 001 111 10 10 111 1101

Page 8: How GZIP works... in 10 minutes

Why GZIP?

• GZIP is not the best compression method, but there are a few good reasons to use it.

• Provides a good tradeoff between speed and ratio.

• Difficulty to add newer compression methods.

Page 9: How GZIP works... in 10 minutes

Implementations

GNU GZIP

7-zip Zopfli

Different implementations, different results

Page 10: How GZIP works... in 10 minutes

GZIP + PHP

$originalFile = __DIR__ . '/jquery-1.11.0.min.js'; $gzipFile = __DIR__ . '/jquery-1.11.0.min.js.gz'; $originalData = file_get_contents($originalFile); $gzipData = gzencode($originalData, 9); file_put_contents($gzipFile, $gzipData); var_dump(filesize($originalFile)); // int(96380) var_dump(filesize($gzipFile)); // int(33305)

Page 11: How GZIP works... in 10 minutes

Beyond GZIP

• Preprocessing the text can have an impact on the compression ratio.

• How? Optimizing matches.

Page 12: How GZIP works... in 10 minutes

Beyond GZIP

Page 13: How GZIP works... in 10 minutes

Beyond GZIP

{ "name": "Raul", "country": "Spain" }, { "name": "Pablo", "country": "USA" }, { "name": "Pedro", "country": "Spain" }

Transposing JSON

{ "name": [ "Raul", "Pablo", "Pedro" ], "country": [ "Spain", "USA", "Spain" ] }

Page 14: How GZIP works... in 10 minutes

Beyond GZIPOrdering XML/HTML attributes

<input id='f1' class='field' name="f1" type="text" /> <input class="field" id="f2" type="text" name="f2" />

<input id="f1" class="field" name="f1" type="text" /> <input class="field" id="f2" type="text" name="f2" />

<input id="f1" class="field" name="f1" type="text" /> <input id="f2" class="field" name="f2" type="text" />

17,76 %

27,10 %

38,32 %

<input type="text" class="field" id="f1" name="f1" /> <input type="text" class="field" id="f2" name="f2" /> 38,32 %

Page 15: How GZIP works... in 10 minutes

Thank you!