Amafacebookformarketers orlando-6-8-2011-110608204812-phpapp02
2011 11-mozcamp-111115062121-phpapp02
-
Upload
arnwbl -
Category
Technology
-
view
228 -
download
0
Transcript of 2011 11-mozcamp-111115062121-phpapp02
BespinSkywriter
Ace
BespinSkywriter
Ace
FirefoxDevTools
BespinSkywriter
Ace
FirefoxDevTools
ETH Zurich
BespinSkywriter
Ace
FirefoxDevTools
ETH Zurich
PDF.JS
?
Overview
• What is PDF.JS
• How PDF is structured
• Processing in PDF.JS
• Images & Fonts
• Infrastructure
• Problems & Todos
• Demo
What is PDF.JS
What is PDF.JS
• building faithful & efficient PDF renderer
What is PDF.JS
• building faithful & efficient PDF renderer
• HTML5 technology experiment
What is PDF.JS
• building faithful & efficient PDF renderer
• HTML5 technology experiment
• no native code
What is PDF.JS
• building faithful & efficient PDF renderer
• HTML5 technology experiment
• no native code
• secure (web sandbox)
What is PDF.JS
• building faithful & efficient PDF renderer
• HTML5 technology experiment
• no native code
• secure (web sandbox)
• Mozilla Labs Project - Open Source
Most vulnerable programs
Source: http://www.csis.dk/en/csis/news/3321
How PDF is structured
PDF file
How PDF is structuredHeader PDF version
PDF file
How PDF is structuredHeader
Body
[Objects]
sequence of objets
fonts, drawing cmds, images, words, bookmarks, form fields
PDF version
PDF file
How PDF is structuredHeader
Body
[Objects]
xRef Table
sequence of objets
fonts, drawing cmds, images, words, bookmarks, form fields
mapping objID ⇔ byte offset
PDF version
PDF file
root objID, xRef byte offset
root obj = ref to pages catalog
How PDF is structuredHeader
Body
[Objects]
xRef Table
Trailer
sequence of objets
fonts, drawing cmds, images, words, bookmarks, form fields
mapping objID ⇔ byte offset
PDF version
PDF file
Processing in PDF.JS
Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N)
Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N)
• page.startRendering(graphics)
Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N)
• page.startRendering(graphics)
• read & convert all PDF cmds ➟ IR
Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N)
• page.startRendering(graphics)
• read & convert all PDF cmds ➟ IR
IntermediateRepresentation
PartialEvaluator
Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N)
• page.startRendering(graphics)
• read & convert all PDF cmds ➟ IR
IntermediateRepresentation
PartialEvaluator
Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N)
• page.startRendering(graphics)
• read & convert all PDF cmds ➟ IR
• load required objects (fonts, images)
IntermediateRepresentation
PartialEvaluator
Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N)
• page.startRendering(graphics)
• read & convert all PDF cmds ➟ IR
• load required objects (fonts, images)
• graphics.executeIR(IR)
IntermediateRepresentation
CanvasGraphics
PartialEvaluator
Processing in PDF.JS
• get plain Uint8Array via XHR2, build Stream
• new PDFDoc(stream): read xRef, root object
• page = PDFDoc.getPage(N)
• page.startRendering(graphics)
• read & convert all PDF cmds ➟ IR
• load required objects (fonts, images)
• graphics.executeIR(IR)
IntermediateRepresentation
Why IR?Data
Why IR?Partial
EvaluatorData
Why IR?Partial
EvaluatorData
Why IR?Partial
Evaluator“get page 2”
Data
Why IR?Partial
Evaluator“get page 2”
Data
builds
Why IR?Partial
Evaluator
draw(obj#3, dict.x, dict.y
)
“get page 2”Data
builds
Why IR?Partial
Evaluator
draw(obj#3, dict.x, dict.y
)
“get page 2”Data
Graphics
builds
Why IR?Partial
Evaluator
draw(obj#3, dict.x, dict.y
)
“get page 2”Data
Graphics
builds
Why IR?Partial
Evaluator
draw(obj#3, dict.x, dict.y
)
“get page 2”Data
Graphics
builds
drawing cmds
Why IR?Partial
Evaluator
draw(obj#3, dict.x, dict.y
)
“get page 2”Data
Graphics
buildsobj#3?dict.x, .y?
drawing cmds
Why IR?Partial
Evaluator
draw(obj#3, dict.x, dict.y
)
“get page 2”Data
Graphics
buildsobj#3?dict.x, .y?
drawing cmds
Why IR?Partial
Evaluator
draw(obj#3, dict.x, dict.y
)
“get page 2”Data
Graphics
buildsobj#3?dict.x, .y?
obj#3 = ”foo”x = 20y = 30
drawing cmds
Why IR?Partial
Evaluator
draw(obj#3, dict.x, dict.y
)
“get page 2”Data
Graphics
buildsobj#3?dict.x, .y?
obj#3 = ”foo”x = 20y = 30
drawing cmds
Why IR?Partial
Evaluator
draw(obj#3, dict.x, dict.y
)
“get page 2”Data
Graphics
buildsobj#3?dict.x, .y?
obj#3 = ”foo”x = 20y = 30
draw oncanvas
drawing cmds
Problem Processing
Problem Processing
• Extracting data slow (compressed)
Problem Processing
• Extracting data slow (compressed)
• Transform data (images) slow
Problem Processing
• Extracting data slow (compressed)
• Transform data (images) slow
• Sometimes a lot of objects on page
Problem Processing
• Extracting data slow (compressed)
• Transform data (images) slow
• Sometimes a lot of objects on page
➡ Freezes UI
Problem Processing
• Extracting data slow (compressed)
• Transform data (images) slow
• Sometimes a lot of objects on page
➡ Freezes UI
➡ Use WebWorker
Problem Processing
• Extracting data slow (compressed)
• Transform data (images) slow
• Sometimes a lot of objects on page
➡ Freezes UI
➡ Use WebWorker
➡ :( no direct memory access, postMessage
Data
MainThread
Web Worker
PartialEvaluatorData
MainThread
Web Worker
PartialEvaluatorData
“get page 2”
data
MainThread
Web Worker
PartialEvaluatorData Data
“get page 2”
data
MainThread
Web Worker
PartialEvaluatorData
builds
Data“get page 2”
data
MainThread
Web Worker
PartialEvaluator
draw(obj#3, dict.x, dict.y
)
Data
builds
Data“get page 2”
data
MainThread
Web Worker
PartialEvaluator
draw(obj#3, dict.x, dict.y
)
Data
builds
Data“get page 2”
data
MainThread
Web Worker
PartialEvaluator
draw(obj#3, dict.x, dict.y
)
Data
builds
Data“get page 2”
data
draw(“foo”, 20, 30
)
MainThread
Web Worker
PartialEvaluator
draw(obj#3, dict.x, dict.y
)
Data
builds
Data“get page 2”
data
draw(“foo”, 20, 30
)
MainThread
Web Worker
IR
PartialEvaluator
draw(obj#3, dict.x, dict.y
)
Data
Graphics
builds
Data“get page 2”
data
draw(“foo”, 20, 30
)
MainThread
Web Worker
IR
PartialEvaluator
draw(obj#3, dict.x, dict.y
)
Data
Graphics
builds
Data“get page 2”
data
draw(“foo”, 20, 30
)
MainThread
Web Worker
IR
IR cmds
PartialEvaluator
draw(obj#3, dict.x, dict.y
)
Data
Graphics
builds
Data“get page 2”
data
draw(“foo”, 20, 30
)
MainThread
Web Worker
IR
IR cmds
PartialEvaluator
draw(obj#3, dict.x, dict.y
)
Data
Graphics
builds
draw oncanvas
Data“get page 2”
data
draw(“foo”, 20, 30
)
MainThread
Web Worker
IR
IR cmds
5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj
5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj
PartialEvaluator
5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj
PartialEvaluator xRef, catalog, resources+
5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj Graphics
PartialEvaluator xRef, catalog, resources+
setGState: [ LW: 10 ]dependency: [ font0 ]setFont: font0, 12beginTextmoveText: 100, 700showText: “Hello World!”endTextmoveTo: 50, 600lineTo: 400, 600stroke
5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj Graphics
PartialEvaluator xRef, catalog, resources+
setGState: [ LW: 10 ]dependency: [ font0 ]setFont: font0, 12beginTextmoveText: 100, 700showText: “Hello World!”endTextmoveTo: 50, 600lineTo: 400, 600stroke
5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj Graphics
PartialEvaluator xRef, catalog, resources+
setGState: [ LW: 10 ]dependency: [ font0 ]setFont: font0, 12beginTextmoveText: 100, 700showText: “Hello World!”endTextmoveTo: 50, 600lineTo: 400, 600stroke
5 0 obj<< /Length 8 0 R>> stream /GS1 gs /F0 12 Tf BT 100 700 Td (Hello World!) Tj ET 50 600 m 400 600 l S endstreamendobj Graphics
PartialEvaluator xRef, catalog, resources+ IR
Images
Images• JPEG streams:
Images• JPEG streams:
• DOMImg.src = 'data:image/jpeg;base64,' + window.btoa(bytesToString(bytes));
Images• JPEG streams:
• DOMImg.src = 'data:image/jpeg;base64,' + window.btoa(bytesToString(bytes));
• If not JPEG stream:
Images• JPEG streams:
• DOMImg.src = 'data:image/jpeg;base64,' + window.btoa(bytesToString(bytes));
• If not JPEG stream:
• read bytes, convert to colorspace
Images• JPEG streams:
• DOMImg.src = 'data:image/jpeg;base64,' + window.btoa(bytesToString(bytes));
• If not JPEG stream:
• read bytes, convert to colorspace
• imgData = canvas.getImageData()
Images• JPEG streams:
• DOMImg.src = 'data:image/jpeg;base64,' + window.btoa(bytesToString(bytes));
• If not JPEG stream:
• read bytes, convert to colorspace
• imgData = canvas.getImageData()
• fillWithPixelData(bytes, imgData)
Images• JPEG streams:
• DOMImg.src = 'data:image/jpeg;base64,' + window.btoa(bytesToString(bytes));
• If not JPEG stream:
• read bytes, convert to colorspace
• imgData = canvas.getImageData()
• fillWithPixelData(bytes, imgData)
• canvas.putImageData(imgData)
Jpeg, but...
Jpeg, but...
• no natives support for CMYK Jpeg
Jpeg, but...
• no natives support for CMYK Jpeg
➡ use JS implementation
Jpeg, but...
• no natives support for CMYK Jpeg
➡ use JS implementation
• no native support for Jpeg 2000
Jpeg, but...
• no natives support for CMYK Jpeg
➡ use JS implementation
• no native support for Jpeg 2000
➡ use EMScripten: C-Lib ➟ JS
Jpeg, but...
• no natives support for CMYK Jpeg
➡ use JS implementation
• no native support for Jpeg 2000
➡ use EMScripten: C-Lib ➟ JS
‣ works, but not that performant
Fonts
Fonts
• There are lots of different font formats!
Fonts
• There are lots of different font formats!
• fonts are converted to OpenType
Fonts
• There are lots of different font formats!
• fonts are converted to OpenType
• use CSS: @font-face { font-family:'font0'; src:url(data:font/opentype;base64, ...)
Fonts
• There are lots of different font formats!
• fonts are converted to OpenType
• use CSS: @font-face { font-family:'font0'; src:url(data:font/opentype;base64, ...)
• some fonts can’t be converted :(
Fonts
• There are lots of different font formats!
• fonts are converted to OpenType
• use CSS: @font-face { font-family:'font0'; src:url(data:font/opentype;base64, ...)
• some fonts can’t be converted :(
• paint them
Fonts
Type I convert to Type II
Type II “use directly”
Type III paint ourself
CDI convert to Type II
Fonts
Type I convert to Type II
Type II “use directly”
Type III paint ourself
CDI convert to Type II
still needto repair
fonts!
Infrastructure
Infrastructure• Using GitHub
Infrastructure• Using GitHub
• Issue Tracker
Infrastructure• Using GitHub
• Issue Tracker
• Pull Requests
Infrastructure• Using GitHub
• Issue Tracker
• Pull Requests
• Wiki
Infrastructure• Using GitHub
• Issue Tracker
• Pull Requests
• Wiki
• Update gh-pages on every push
Infrastructure• Using GitHub
• Issue Tracker
• Pull Requests
• Wiki
• Update gh-pages on every push
• Testing:
Infrastructure• Using GitHub
• Issue Tracker
• Pull Requests
• Wiki
• Update gh-pages on every push
• Testing:
• In Pull Request: “@pdfjsbot test”
Infrastructure• Using GitHub
• Issue Tracker
• Pull Requests
• Wiki
• Update gh-pages on every push
• Testing:
• In Pull Request: “@pdfjsbot test”
• Runs tests on AC2 instance
Infrastructure
Infrastructure
• AreWePdfYet?
Infrastructure
• AreWePdfYet?
• Take top100 PDFs from Google
Infrastructure
• AreWePdfYet?
• Take top100 PDFs from Google
• render the first 5 pages each
Infrastructure
• AreWePdfYet?
• Take top100 PDFs from Google
• render the first 5 pages each
• compare to Preview
Infrastructure
• AreWePdfYet?
• Take top100 PDFs from Google
• render the first 5 pages each
• compare to Preview
• http://people.mozilla.com/~bdahl/corpusreport/test/ref/
Todo = Help :)
Worker Canvas
'Read-Only' Memory Web Worker
Faster Canvas Rendering
CMYK JpegJpeg2000
Font Load Event
WebPrint API
XHR Range Support
Font Support
Parallel Web Worker
SVG Backend
(text selection [Gecko])
“HTML5” Backend
Search | Selection | Copy
Input Forms
More Parts Of Spec
Improve Viewer
Pref & MemoryAnalysis
Improve Test Infrastructure
More Testing!
More Testing
• use PDF.JS extension!
• http://mozilla.github.com/pdf.js/extensions/firefox/pdf.js.xpi
• report broken PDFs!
• help us categorize issues
Feedback Feature
Demo
Github: https://github.com/mozilla/pdf.js
Twitter: @pdfjs
Mailing List: https://groups.google.com/group/mozilla.dev.pdf-js/topics
IRC: irc.mozilla.org #pdfjs
Engineering Weekly Call:
Thursday - 10:00am PDT, 17:00 UTC
ReadmeIssuesWiki
Q & A