PDF.JS at SwissJeese 2012
-
Upload
julian-viereck -
Category
Technology
-
view
1.707 -
download
1
description
Transcript of PDF.JS at SwissJeese 2012
![Page 1: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/1.jpg)
Julian Viereck
@jviereck+julian.viereck
![Page 2: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/2.jpg)
Overview
• What is PDF.JS about
• How PDF is structured & processing in PDF.JS
• “Why are you doing this?”
• Firefox Integration
• What’s next?
• Demo
• Q & A
5
10
15
5
5
15
5
![Page 3: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/3.jpg)
BespinSkywriter
Ace
FirefoxDevTools
ETH Zurich
(Physics)PDF.JS
?
About me
![Page 4: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/4.jpg)
PDF Viewerusing
OpenWebStandards
![Page 5: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/5.jpg)
What is PDF.JS
• building faithful & efficient PDF viewer
• HTML5 technology experiment
• no native code
• secure (web sandbox)
• Mozilla Labs Project - Open Source (Github)
![Page 6: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/6.jpg)
What is PDF.JS
• Not Firefox-Specific - all modern browsers
• 1.3 MB uncompressed JS
• ~ 33`000 lines of code
• viewer in different languages
• async API
![Page 7: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/7.jpg)
root objID, xRef byte offset
root obj = ref to pages catalog
How PDF is structuredHeader
Body
[Objects]
xRef Table
Trailer
sequence of objets
fonts, drawing cmds, images, words, bookmarks, form fields
mapping objID ⇔ byte offset
PDF version
PDF file
![Page 8: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/8.jpg)
Let’s look at it
![Page 9: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/9.jpg)
CanvasGraphics
PartialEvaluator
Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N)
• page.startRendering(graphics)
• read & convert all PDF cmds ➟ OL
• load required objects (fonts, images)
• graphics.executeOperatorList(OL)
OperationList
![Page 10: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/10.jpg)
Execution ExamplePartial
Evaluator
draw(obj#3, dict.x, dict.y
)
“get page 2”Data
Graphics
buildsobj#3?dict.x, .y?
obj#3 = ”foo”x = 20y = 30
draw oncanvas
drawing cmds
![Page 11: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/11.jpg)
Problem Processing
• Extracting data slow (compressed)
• Transform data (images) slow
• Sometimes a lot of objects on page
➡ Freezes UI
➡ Use WebWorker
➡ :( no direct memory access, postMessage
![Page 12: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/12.jpg)
PartialEvaluator
draw(obj#3, dict.x, dict.y
)
Data
Graphics
builds
draw oncanvas
Data“get page 2”
data
draw(“foo”, 20, 30
)
MainThread
Web Worker
OpListOperation
List + Data
![Page 13: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/13.jpg)
setGState: [ LW: 10 ]dependency: [ font0 ]setFont: font0, 12beginTextmoveText: 100, 700showText: “Hello World!”endTextmoveTo: 50, 600lineTo: 400, 600stroke
5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj Graphics
PartialEvaluator xRef, catalog, resources+ OL
![Page 14: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/14.jpg)
Images• JPEG streams:
• DOMImg.src = 'data:image/jpeg;base64,' + window.btoa(bytesToString(bytes));
• If not JPEG stream:
• read bytes, convert to colorspace
• imgData = canvas.getImageData()
• fillWithPixelData(bytes, imgData)
• canvas.putImageData(imgData)
![Page 15: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/15.jpg)
Jpeg, but...
• no natives support for Jpeg 2000, CMYK
➡ use JS implementation
‣ works, not that performant but good enough
![Page 16: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/16.jpg)
Fonts
• There are lots of different font formats!
• fonts are converted to OpenType
• use CSS for loading: @font-face { font-family:'font0'; src:url(data:font/opentype;base64, ...)
• Fonts are sanitized by browser
• Need to rebuild malformed fonts :/
![Page 17: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/17.jpg)
“Why are you doing this?”
aka. ∃ C/C++ libraries= isn’t that faster?
![Page 18: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/18.jpg)
“Performance is not the only measure”
![Page 19: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/19.jpg)
1. Security
![Page 20: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/20.jpg)
Most vulnerable programs
Source: http://www.csis.dk/en/csis/news/3321
![Page 21: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/21.jpg)
~ 25% crashes in Firefox are Plugin related
![Page 22: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/22.jpg)
2. WebSpecific Viewer
![Page 23: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/23.jpg)
3. Drive Innovation
![Page 24: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/24.jpg)
4. Speed
![Page 25: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/25.jpg)
4. Speed
• Rendering slower then C/C++
• BUT
• Partial downloading
• Render page in background
• Make slow become faster
• Mostly: Good enough
![Page 26: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/26.jpg)
5. Can do better
![Page 27: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/27.jpg)
6. Push WebPlatform
![Page 28: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/28.jpg)
B2G aka. Boot2Gecko
![Page 29: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/29.jpg)
![Page 30: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/30.jpg)
![Page 31: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/31.jpg)
![Page 32: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/32.jpg)
![Page 33: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/33.jpg)
New API: Printing
• Printing very limited on the web right now
• no way to achieve native printing experience
• NEED: New API for printing
• mozPrintCallback
• define canvas content during printing
• send drawing commands directly to printer
![Page 34: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/34.jpg)
WebPagePrint
Single Pages
![Page 35: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/35.jpg)
• Find print canvas on page
• Execute printCallback
• All canvas done ➠ print page
Page 1
Page 2
![Page 36: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/36.jpg)
canvas.mozPrintCallback
![Page 37: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/37.jpg)
Firefox Integration
![Page 38: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/38.jpg)
Firefox Integration
• PDF.JS as bundled Addon in Firefox Nightly
• Getting in Release Channel is hard
• 400M users have expectations
• more testing coverage
• accessibility
• match UX expectation
• fallback if something is not working
![Page 39: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/39.jpg)
Firefox Integration
• Try to make it till Aurora Merge (6/5)
• Firefox Specific, BUT
• improving quality browser independent
• only small parts Firefox specific
![Page 40: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/40.jpg)
What’s next
• Fix broken PDFs
• Improve performance
• Improve Text selection
• Text search
• Form support
• Printing support
![Page 41: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/41.jpg)
Demo
![Page 42: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/42.jpg)
Contributing
• Lots of areas
• Translation
• Writing Code (embeddable viewer?)
• Testing (Firefox Auto-Update Addon)
![Page 43: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/43.jpg)
Github: https://github.com/mozilla/pdf.js
Twitter: @pdfjs
Mailing List: https://groups.google.com/group/mozilla.dev.pdf-js/topics
IRC: irc.mozilla.org #pdfjs
Engineering Weekly Call:
Thursday - 10:00am PDT
ReadmeIssuesWiki
![Page 44: PDF.JS at SwissJeese 2012](https://reader034.fdocuments.in/reader034/viewer/2022051816/546249fab4af9f621c8b46ae/html5/thumbnails/44.jpg)
Q & A