Large Files without the Trials

29
Large Files Without the Trials Aaron VanDerlip and Sally Kleinfeldt Plone Symposium East 2010 Thursday, June 3, 2010

description

Sally Kleinfeldt and Aaron VanDerlip describe ore.bigfile, a minimalist solution to the problem of uploading, downloading, and versioning very large files in Plone.

Transcript of Large Files without the Trials

Page 1: Large Files without the Trials

Large FilesWithout the Trials

Aaron VanDerlip and Sally KleinfeldtPlone Symposium East 2010

Thursday, June 3, 2010

Page 2: Large Files without the Trials

Acknowledgments

• Bioneers provides environmental education and social connectivity through conferences, radio and TV, books, and online materials

• Engaged Jazkarta to build a file asset server based on Plone to help them organize, capture, and store multimedia and textual content with files as large as 5 GB.

Thursday, June 3, 2010

Page 3: Large Files without the Trials

Acknowledgments

• Aaron VanDerlip - Project Manager

• Kapil Thangavelu - Developer

Thursday, June 3, 2010

Page 4: Large Files without the Trials

What is a Big File?

• Anything that makes you wait...

Thursday, June 3, 2010

Page 5: Large Files without the Trials

Plone Problems with Big Files

1.Uploading/Downloading

2.Versioning

Thursday, June 3, 2010

Page 6: Large Files without the Trials

Uploading Big Files

• Both the user and a Zope thread are waiting for the file transfer

Thursday, June 3, 2010

Page 7: Large Files without the Trials

Thursday, June 3, 2010

Page 8: Large Files without the Trials

Uploading Big Files

• Browser encodes file in multipart mime format

• Zope must undo this encoding

• CPU and memory intensive, and SLOW

• Zope thread is blocked during this process

Thursday, June 3, 2010

Page 9: Large Files without the Trials

Downloading Big Files

• ...the same thing happens in reverse

Thursday, June 3, 2010

Page 10: Large Files without the Trials

Learning from Rails

• Get file encoding/unencoding and read/write operations out of Plone

• Web servers are really good at this -Apache, Nginx, and Lighttpd

• Our implementation uses Apache

• Apache file streaming is fast and threads are cheap

Thursday, June 3, 2010

Page 11: Large Files without the Trials

Learning from Rails

• Uploads: Apache plus mod_porter http://therailsway.com/tags/porter

• Downloads: Apache plus mod_xsendfile http://john.guen.in/past/2007/4/17/send_files_faster_with_xsendfile/

• ...and of course ZODB Blob storage

Thursday, June 3, 2010

Page 12: Large Files without the Trials

Mod Porter

• Parses the multipart mime data

• Writes the file to disk

• Changes the Request to contain a pointer to the temp file on disk

• All done efficiently in C code inside your Apache process

Thursday, June 3, 2010

Page 13: Large Files without the Trials

Mod Porter

Thursday, June 3, 2010

Page 14: Large Files without the Trials

Apache Config for Mod Porter

LoadModule apreq_module /usr/lib/Apache2/modules/mod_apreq2.so

LoadModule porter_module /usr/lib/Apache2/modules/mod_porter.so

# Apache has a default read limit of 64MB, set it higher

APREQ2_ReadLimit 2G

...

Porter On

# Files below this size will not be handled by mod-porter

PorterMinSize 14M

# Where the uploaded files are stored

PorterDir /mnt/uploads-Apache

Thursday, June 3, 2010

Page 15: Large Files without the Trials

X-Sendfile

• HTTP header

• Set an X-Sendfile header and the path of a file on your response

• Apache does the rest

Thursday, June 3, 2010

Page 16: Large Files without the Trials

Apache Config for X-Sendfile

LoadModule xsendfile_module /usr/lib/Apache2/modules/mod_xsendfile.so

...

EnableSendfile On

XSendFile on

# Config to send file resources directly from blob storage

XSendFilePath /mnt/bioneers/var/blobstorage

Thursday, June 3, 2010

Page 17: Large Files without the Trials

Using X-Sendfile from Python

def download(self, response, file_path):

response.setHeader("X-Sendfile",

file_path)

Thursday, June 3, 2010

Page 18: Large Files without the Trials

Blob Storage

• Uploads

• Blob.consumeFile moves file from Apache’s temp area to blob storage (ZODB/blob.py)

• Uses os.rename, file never enters Plone

• Downloads

• Served directly from blob storage

Thursday, June 3, 2010

Page 19: Large Files without the Trials

Upload Process

Thursday, June 3, 2010

Page 20: Large Files without the Trials

What About Really Really Big Files?

• Use FTP

• Supports continuation and batching

• Handles files too large for browser limits

• Content editors use FTP to transfer files to an upload directory

Thursday, June 3, 2010

Page 21: Large Files without the Trials

UI

Thursday, June 3, 2010

Page 22: Large Files without the Trials

Uploading with FTP

Thursday, June 3, 2010

Page 23: Large Files without the Trials

ore.bigfile

• Minimally intrusive, works with the grain of Plone

• Provides Big File content type

• IFrontendFileServer interface defines two methods that provide web server support for upload and download

• Apache and Nginx implementations provided

Thursday, June 3, 2010

Page 24: Large Files without the Trials

ore.bigfileLimitations

• Upload directory is hardcoded

• Possibility of error on very large images which Mod Porter intercepts

Thursday, June 3, 2010

Page 25: Large Files without the Trials

Versioning Big Files

Thursday, June 3, 2010

Page 26: Large Files without the Trials

Solution

• Bypass CMFEditions - no file size limitation

• Create a new version only when file changes (not metadata)

• Allow old versions to be purged

• Version information stored on Big File object using annotations

Thursday, June 3, 2010

Page 27: Large Files without the Trials

UI

Thursday, June 3, 2010

Page 28: Large Files without the Trials

Conclusion

• ore.bigfile solves the Big File problem for a particular use case, not feature complete

• It does so by taking advantage of mature web server technology

• The code is minimally intrusive

• It provides a strategy for implementation we can learn from as we improve Plone’s Big File story

Thursday, June 3, 2010