CIS601 Graduate Seminar (Presentation#02) -...

32
Compressing JPEG Files on a Storage Service System CIS601 Graduate Seminar (Presentation#02) The Design, Implementation, and Deployment of a System to Transparently By : Daniel Izadnegahdar Date : 11/06/17 Compress Hundreds of Petabytes(2 ^ 50 bytes) of Image Files For a File-Storage Service https://www.usenix.org/system/files/conference/nsdi17/nsdi17-horn-daniel.pdf Slide: 1/32

Transcript of CIS601 Graduate Seminar (Presentation#02) -...

Page 1: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Compressing JPEG Files on a Storage Service System

CIS601 Graduate Seminar (Presentation#02)

The Design, Implementation, and Deployment of a System to Transparently

By: Daniel Izadnegahdar

Date: 11/06/17

The Design, Implementation, and Deployment of a System to TransparentlyCompress Hundreds of Petabytes(2 ^ 50 bytes) of Image Files For a File-Storage Service

https://www.usenix.org/system/files/conference/nsdi17/nsdi17-horn-daniel.pdf

Slide: 1/32

Page 2: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

What is a File Storage Service System?

• Also known as “Centrally Hosted Network File Systems”.

• A service where users can access a shared memory space.

• Files can be uploaded, downloaded, and edited by multiple users.

• Prior file transferring methods were limited on file size and • Prior file transferring methods were limited on file size and accessibility.

Slide: 2/32

Page 3: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Example file storage services

• Amazon Cloud Drive

• BoxWorks

• DropBox• DropBox

• Google Drive

• Microsoft OneDrive

• SurgarSync

Slide: 3/32

Page 4: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Challenges with file storage services

• Efficiency:• Many existing compressions are inefficient and failing to reduce file size.

• Finding the balance between quality and file size when compressing.

• Big data:• On dropbox, images makeup 35% of the bytes stored.• On dropbox, images makeup 35% of the bytes stored.

• Speed:• Files must be retrieved as fast as the user’s internet connection(> 100mbps).

• Price:• Storage services are often free to the user, but expensive to the company, especially when

data gets larger.

Slide: 4/32

Page 5: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Focus on addressing challenges

The focus of this presentation is to address these challenges by looking at JPEG images specifically, challenges by looking at JPEG images specifically,

and compressing them further….

Slide: 5/32

Page 6: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Challenges with existing storage service compression methods

• Existing method include JPEGrescan, MozJPEG, packJPG.• JPEG is already compressed, and many existing methods • JPEG is already compressed, and many existing methods

can’t really compress more than 1%.• Some can achieve higher compression, but are very slow.

Slide: 6/32

Page 7: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Introducing Lepton

• Reduces JPEG images by 23%, saving 8% of dropbox’s storage space.

• Lepton is a lossless compression tool developed by dropbox to improve compression speed, efficiency, and increase size reduction.

• Also known as format-specific transparent file compression.

• Written in C++ and open-source.

• Image files are distributed amongst ~4 mb chunks across multiple servers. Compression must be able to concatenate and decompress properly.

• Mainly works on JPEG: Joint Photographic Expert Group (most common format of transferring photos in the web).

• Has compressed more than 203 PiB (Pebibytes = 2^50 bytes), saving more than 46 PiB.

• Has been deployed for 1 year on the dropbox file-storage system.

• Can run on multiple OS: Linux, MacOS, Windows, IOS, and Android.Slide: 7/32

Page 8: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Challenges when designing Lepton

• Maintain lossless Compression (Round-trip transparency):• Maintain a lossless recovery of the original images, even after

software updates that occurred after the file was saved.• File Concatenation (Distribution across independent chunks):

• Dropbox images are stored in independent 4mb chunks across • Dropbox images are stored in independent 4mb chunks across many servers. Lepton must decompress those chunks and piece them correctly.

• Memory:• To preserve server resources, optimize memory management by

using time multiplexing, instead of decoding the entire file into RAM.

Slide: 8/32

Page 9: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Lepton Performance Overview

Slide: 9/32

Page 10: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Lepton improving Speed…Lepton improving Speed…

Slide: 10/32

Page 11: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Lepton addressing: speed

• Lepton addresses speed by using multiple threads when compressing.

• Using pipelining so processes are not waiting for one to finish to start.

• Uses a series of encoding algorithms that are more efficient.

• Using time multiplexing instead of saving everything to external memory.

Slide: 11/32

Page 12: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Lepton addressing: speed

• Lepton addresses speed by using multiple threads when compressing.

• More threads means faster compression per image below.

Slide: 12/32

Page 13: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Lepton improving size reduction…Lepton improving size reduction…

Slide: 13/32

Page 14: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Types of Compression: Lossy / Lossless

• Lossy Compression:• Information is lost.

• Eliminates unnecessary information (i.e. blocks of the same color).

• Can achieve higher % reductions than lossless.

• Lossless Compression:• Lossless Compression:• Image information is not lost.

• Decoded images have the same pixel makeup from their original state.

Slide: 14/32

Page 15: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Lossy/Lossless Compression Examples

Image Compression

Predictive Coding

Lossy CompressionLossless Compression

Entropy Coding Chroma Sub Lossy Predictive Predictive CodingEntropy Coding

Dictionary Coding Transform Coding

SamplingLossy Predictive

Coding

1) RLE2) LZW3) Bit Pair Encoding

1) Huffman2) Shanon-Fano3) Arithmetic4) Unary

1) MED2) GAP3) GED

1) DCT2) Wavelet3) Karhunen-Loeve

Transform Slide: 15/32

Page 16: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Current JPEG Compression Process

Lepton provides addresses all Lepton provides addresses all stages but mainly the

encoding/decoding stage

Slide: 16/32

Page 17: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Color Transform

• Convert a pixel into separate luminance images:• Y’ (brightness)

• Cb(blue chrominance)

• Cr(red chrominance)

• The human eye can observer light better than color.

Slide: 17/32

Page 18: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Down Sampling/ Block Splitting• Break images into 8x8 pixel blocks.

• The pixels in those blocks are given a value based on.

= 0= 0

=99

= 47?

Slide: 18/32

Page 19: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Forward DCT

• Next, it interpolates coefficients from those scores, and interpolates a cosine function that relates as best as possible to the original data.

• If the curve doesn’t match well, more coefficients are calculated.

Pix

el S

co

res

PixelsSlide: 19/32

Page 20: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Forward DCT & Quantization

• DFT/DCT are ways of determining coefficients from an image, by converting Y’, Rb, and Rcinto the frequency domain.

• DFT: Discrete Fourier Transform

• Efficient but complex and poor energy compaction.compaction.

• DCT: Discrete Cosine Transform

• Preferred by Lepton because it produces less coefficients for the same image.

• The fewer the coefficients, the more efficient.

• Use this to combine the blocks together.

• Quantization then rounds off the high-frequencies that are already hard to see with the naked eye.

Slide: 20/32

Page 21: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Encoding

• Encoding is the last step in the compression process.

• It converts the data into a single structure that is easily accessed by an image application.

• Current methods involve sweeping through • Current methods involve sweeping through the image in a zigzag formation (as shown).

• The coefficients are then grouped based on frequency.

Slide: 21/32

Page 22: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

• Current method of encoding uses Huffman coding but Lepton uses Arithmetic encoding (considered 5 – 10% more efficient).

• Huffman’s code length is restricted to multiples of a bit, whereas Arithmetic encoding has an optimized length.

• Arithmetic encoding stores high-frequency characters with less

Encoding Improvements: Arithmetic Encoding

• Arithmetic encoding stores high-frequency characters with less bits, and rarely used characters with more bits, resulting in size reduction.

• Coefficients of larger magnitude are grouped together.

• Arithmetic encoding is considered “mathematically superior” to Huffman coding as far as compression and efficiency is concerned.

Slide: 22/32

Page 23: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Encoding Improvements: Probability model

• Lepton also uses probability to improve encoding.

• It assesses adjacent blocks, and uses statistics, to predict coefficients.

• It avoids complex sorting algorithms by using more probabiliy bins. (Lepton uses 100X more bins than packJPG because packJPG resorts on sorting).

Slide: 23/32

Page 24: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

• At the start, each bin is initialized with a 50-50 probability of being a 0 or 1.

• Based on supervised-machine learning per a series of historical images, the probability is adjusted for each bin.

A General Overview on Probability Bins

images, the probability is adjusted for each bin.

• As the bin values are determined, the structure begins to adapt and “learn” for the remaining bins.

• The probability model was created empirically, meaning it is based on historical image data.

• This method was tested on 300,000 images and returned a 95% success interpolation rate.

Slide: 24/32

Page 25: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Performance: Speed & Size Reduction

• Lepton improves size reduction by 15% on average.

• Lepton can compress 25% faster on average, compared to conventional tools.

Slide: 25/32

Page 26: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Performance: System Memory

• Lepton also improved on system memory requirements. It required requirements. It required 50% less memory on average.

Slide: 26/32

Page 27: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Performance: Compression uniformity on file size

• Smaller file sizes can compress slightly better but overall, % savings are relatively uniform across file sizes.

Slide: 27/32

Page 28: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Performance: Traffic

• Decode rates(downloads) are higher than encode rates(uploads).

• There are more uploads on weekends than downloads.

Slide: 28/32

Page 29: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Looking ahead…

• Compression is currently deployed on backend, but looking to implement on front-end(client-side), which will save 23% in network bandwidth of uploading and downloading JPEG files.

• Looking to compress beyond JPEG files, such as H.264 video files.

Slide: 29/32

Page 30: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

Conclusion• Lepton is an open-source system that compresses JPEG images

by 23%.

• It’s been deployed on dropbox and has compressed more than 150 billion images or 203 Pebibytes(10245 bytes) to date.

• Deployment has been smooth and reliable. Only identified errors • Deployment has been smooth and reliable. Only identified errors were human errors.

Slide: 30/32

Page 31: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

References

• https://blogs.dropbox.com/tech/2016/07/lepton-image-compression-saving-22-losslessly-from-images-at-15mbs/

• https://techcrunch.com/2016/07/14/dropboxs-lepton-lossless-image-compression-really-uses-a-middle-out-algorithm/

• https://en.wikipedia.org/wiki/JPEG#Downsampling• https://en.wikipedia.org/wiki/JPEG#Downsampling

• https://news.ycombinator.com/item?id=12094002

• http://windowsreport.com/dropbox-lepton-photo-compression/

• https://venturebeat.com/2016/07/14/dropbox-open-sources-lepton-a-compression-algorithm-that-cuts-jpeg-file-size-by-22/

• https://nhs.io/lepton/

Slide: 31/32

Page 32: CIS601 Graduate Seminar (Presentation#02) - …cis.csuohio.edu/~sschung/CIS601/PetaByteImageFileSystem_DanIza.pdf Slide: 1/32. ... Lepton must decompress those chunks and piece them

EndEnd

Slide: 32/32