iText 7: PDF is dead, long live PDF
Transcript of iText 7: PDF is dead, long live PDF
PDF made easy with iText 7
PDF is dead! Long live PDF!
Benoit Lagae, Developer, iText SoftwareBruno Lowagie, Chief Strategy Officer, iText Group
Is PDF dead?
PDF specifications
Everybody uses HTML
Source:http://duff-johnson.com/2014/03/10/98-percent-of-dot-com-is-html-but-38-percent-of-dot-gov-is-pdf/
But governments love PDF
Source:http://duff-johnson.com/2014/03/10/98-percent-of-dot-com-is-html-but-38-percent-of-dot-gov-is-pdf/
Percentage of PDF files:.org: 15%.gov: 38%.edu: 27%
Publication versus …
• No need to be self-contained• May change over time• Not all content produced by the author
• e.g. Advertisements• Becoming more interactive
• e.g. Comments on a news article
… Document
• Self-contained• Unchanging (non-dynamic)• Able to be authenticated• Able to be secured/protected
Not counting HTML, PDF is king
Source:http://duff-johnson.com/2015/02/12/the-8-most-popular-document-formats-on-the-web-in-2015/
Publication:HTML depends on context
Document:PDF is forever
PDF/Eengineering
Since 2008 ISO 24517
PDF/VTprinting
Since 2010ISO 16612
PDF/Xgraphic arts
Since 2001ISO 15930
PDF/Aarchive
Since 2005ISO 19005
PDF/UAaccessibility
Since 2012ISO 14289
PDFPortable Document FormatFirst released by Adobe in 1993ISO Standard since 2008ISO 32000
Related: XFDF (ISO), EcmaScript (ISO), PRC (ISO), PAdES (ETSI), ZUGFeRD
An umbrella of standards
iText 7: a PDF engine
Image exampleImage fox = new Image(ImageFactory.getImage(FOX));Image dog = new Image(ImageFactory.getImage(DOG));Paragraph p = new Paragraph("The quick brown ").add(fox) .add(" jumps over the lazy ").add(dog);document.add(p);
On the importance of making a document accessible
Can everyone read this?
Some structure is helpful
title
list item
list item
list item
Label Content
Can everyone read this?
How do we read a spider chart?
Ris
k M
anag
emen
t
Stru
ctur
ed F
inan
ce
Mer
gers
& a
cqui
sitio
ns
Gov
erna
nce
& In
tern
al
Con
trol
Acc
ount
ing
Ope
ratio
ns
Trea
sury
ope
ratio
ns
Man
agem
ent I
nfor
mat
ion
& B
usin
ess
Dec
isio
n Su
ppor
tB
usin
ess
Plan
ning
&
Stra
tegy
Fina
nce
Con
trib
utio
n to
IT
Man
agem
ent
Com
mer
cial
Act
iviti
es
Taxa
tion
Func
tiona
l Lea
ders
hip
Resolve abbreviations
What goes into
rows / columns?Make info color
independent
Is this a better way to read data?
Adapting the‘quick brown fox’
example for PDF/UA
PDF/UA (part 1)PdfDocument pdf = new PdfDocument(new PdfWriter(dest));Document document = new Document(pdf);
//Setting some required parametersPdf.setTagged();pdf.getCatalog().setLang(new PdfString("en-US"));pdf.getCatalog().setViewerPreferences( new PdfViewerPreferences().setDisplayDocTitle(true));PdfDocumentInfo info = pdf.getDocumentInfo();info.setTitle("iText7 PDF/UA example");//Create XMP meta datapdf.createXmpMetadata();
PDF/UA (part 2)//Fonts need to be embeddedPdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);Paragraph p = new Paragraph();p.setFont(font);p.add(new Text("The quick brown "));Image foxImage = new Image(ImageFactory.getImage(FOX));//PDF/UA: Set alt textfoxImage.getAccessibilityProperties().setAlternateDescription("Fox");p.add(foxImage);p.add(" jumps over the lazy ");Image dogImage = new Image(ImageFactory.getImage(DOG));//PDF/UA: Set alt textdogImage.getAccessibilityProperties().setAlternateDescription("Dog");p.add(dogImage);document.add(p);
document.close();
Result
On the importance of making a document archivable
PDF/A• ISO-19005
– Long-term preservation of documents– Approved parts will never become invalid– Individual parts define new, useful features
• Obligations and restrictions– Metadata: ISO 16684, eXtensible Metadata Platform (XMP)– The document must be self-contained:
• All fonts need to be embedded• No external movie, sound or other binary files
– No JavaScript allowed– No encryption allowed
Three standards• PDF/A-1 (2005)
– based on PDF 1.4– Level B (“basic”): visual appearance– Level A (“accessible”): visual appearance + structural and semantic properties (Tagged PDF)
• PDF/A-2 (2011)– Based on ISO-32000-1– Features introduced in PDF 1.5, 1.6, and 1.7:
• Added support for JPEG2000, Collections, object-level XMP, optional content• Improved support for transparency, comment types and annotations, digital signatures
– Level U (“unicode”): visual appearance + all text is in Unicode
• PDF/A-3 (2012)– Based on PDF/A-2 with only 1 difference: attachments do not need to be PDF/A
Adapting the ‘quick brown fox’example for PDF/A
PDF/A-1b examplePdfADocument pdf = new PdfADocument(new PdfWriter(dest), PdfAConformanceLevel.PDF_A_1B, new PdfOutputIntent("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", new FileInputStream(INTENT)));Document document = new Document(pdf);//Create XMP meta datapdf.createXmpMetadata();//Fonts need to be embeddedPdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);Paragraph p = new Paragraph();p.setFont(font);p.add(new Text("The quick brown "));Image foxImage = new Image(ImageFactory.getImage(FOX));p.add(foxImage);p.add(" jumps over the lazy ");Image dogImage = new Image(ImageFactory.getImage(DOG));p.add(dogImage);document.add(p);document.close();
Resulting PDF/A-1b
PDF/A-1a examplePdfADocument pdf = new PdfADocument(new PdfWriter(dest), PdfAConformanceLevel.PDF_A_1A, new PdfOutputIntent("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", new FileInputStream(INTENT)));Document document = new Document(pdf);pdf.setTagged();pdf.createXmpMetadata();PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);Paragraph p = new Paragraph();p.setFont(font);p.add(new Text("The quick brown "));Image foxImage = new Image(ImageFactory.getImage(FOX));foxImage.getAccessibilityProperties().setAlternateDescription("Fox");p.add(foxImage);p.add(" jumps over the lazy ");Image dogImage = new Image(ImageFactory.getImage(DOG));dogImage.getAccessibilityProperties().setAlternateDescription("Dog");p.add(dogImage);document.add(p);document.close();
Resulting PDF/A-1a
Real-world use:publishing a CSV file as PDF/A-3a
and PDF/UA
United States database
United States examplepart 1: initializations
PdfADocument pdf = new PdfADocument( new PdfWriter(dest), PdfAConformanceLevel.PDF_A_3A, new PdfOutputIntent("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", new FileInputStream(INTENT)));Document document = new Document(pdf, PageSize.A4.rotate());//Setting some required parameterspdf.setTagged(); // PDF/UA and PDF/A Level apdf.getCatalog().setLang(new PdfString("en-US")); // PDF/UA pdf.getCatalog().setViewerPreferences( // PDF/UA new PdfViewerPreferences().setDisplayDocTitle(true)); // PDF/UA PdfDocumentInfo info = pdf.getDocumentInfo(); // PDF/UA info.setTitle("iText7 PDF/A-3 example"); // PDF/UA //Create XMP meta datapdf.createXmpMetadata(); // PDF/UA and PDF/A Level a
United States examplepart 2: add attachment
//Add attachmentPdfDictionary parameters = new PdfDictionary();parameters.put(PdfName.ModDate, new PdfDate().getPdfObject());PdfFileSpec fileSpec = PdfFileSpec.createEmbeddedFileSpec( pdf, Files.readAllBytes(Paths.get(DATA)), "united_states.csv", "united_states.csv", new PdfName("text/csv"), parameters, PdfName.Data, false);fileSpec.put(new PdfName("AFRelationship"), new PdfName("Data"));pdf.addFileAttachment("united_states.csv", fileSpec);PdfArray array = new PdfArray();array.add(fileSpec.getPdfObject().getIndirectReference());pdf.getCatalog().put(new PdfName("AF"), array);
United States examplepart 3: parse CSV file
PdfFont font = PdfFontFactory.createFont(FONT, true);PdfFont bold = PdfFontFactory.createFont(BOLD_FONT, true);// Parsing a CSV file and add data to a tableTable table = new Table(new float[]{4, 1, 3, 4, 3, 3, 3, 3, 1});table.setWidthPercent(100);BufferedReader br = new BufferedReader(new FileReader(DATA));String line = br.readLine();process(table, line, bold, true);while ((line = br.readLine()) != null) { process(table, line, font, false);}br.close();document.add(table);document.close();
United States examplepart 4: process each line
public void process(Table table, String line, PdfFont font, boolean isHeader) { StringTokenizer tokenizer = new StringTokenizer(line, ";"); while (tokenizer.hasMoreTokens()) { if (isHeader) { table.addHeaderCell( new Cell().add( new Paragraph(tokenizer.nextToken()).setFont(font))); } else { table.addCell( new Cell().add( new Paragraph(tokenizer.nextToken()).setFont(font))); } }}
United States example: result
United States example: result
Real-world use: ZUGFeRD,the future of invoicing
Invoices:Need to be archived
Invoices:Need to be accessible
Invoices:Need to be machine-readable
Invoices:Need to be machine-readable
iText 7 and its value add-ons
New in iText 7:improved typography
and support for Indic scripts
iText 5: missing links
Indic scripts:• Only unsupported major script family• Feature request #1• Huge opportunity
• limited support in most other PDF libraries
Other features:• Optional ligatures in Latin script• Vowel diacritics in Arabic
Indic scripts: problems•Lack of expertise
• Unicode encodes 49 Indic scripts• Complex scripts with unique features
• Glyph repositioning: ह + ि� = हिह• Glyph substitution: ம + ு� = மு• Half-characters: त + �� + य = त्य
•Unsolvable issues for iText 5 font engine• No dedicated Unicode points for half-characters• No font lookups past ‘\uFFFF’• Ligaturization is context-dependent (virama)
Indic scripts: solutions
Writing a new font engine• Automatic script recognition
• Based on Unicode ranges
• Flexibility = extensibility• Generic Shaper class • Separate module, only called when necessary
• Glyph replacement rules• Different per writing system• Alternate glyphs are font-dependent
Indic scripts: examplesPdfFont font = PdfFontFactory.createFont(arial, PdfEncodings.IDENTITY_H, true);String txt = "\u0938\u093E\u0939\u093F\u0924\u094D\u092F\u0915\u093E\u0930"; // saahityakaardocument.add(new Paragraph(txt).setFont(font));
String txt = "\u0B8E\u0BB4\u0BC1\u0BA4\u0BCD\u0BA4\u0BBE\u0BB3\u0BB0\u0BCD"; // eluttaalardocument.add(new Paragraph(txt).setFont(font));
Other scripts: examplesPdfFont font = PdfFontFactory.createFont(arial, PdfEncodings.IDENTITY_H, true);String txt = "\ u0627\u0644\u0643\u0627\u062A\u0628"; // al-katibudocument.add(new Paragraph(txt).setFont(font));
String txt = "writer"; GlyphLine glyphLine = font.createGlyphLine(txt);Shaper.applyLigaFeature(foglihtenNo07, glyphLine, null);canvas.showText(glyphLine)
Status of advanced typography in iText 7
•Indic scripts• We already support:
• Devanagari• Tamil
• Coming soon:• Telugu• Others: based on customer demand
•Arabic• Support for vocalized Arabic (diacritics) is in development
•Latin• Optional ligatures are fully supported