iText 7: PDF is dead, long live PDF

51
PDF made easy with iText 7 PDF is dead! Long live PDF! Benoit Lagae, Developer, iText Software Bruno Lowagie, Chief Strategy Officer, iText Group

Transcript of iText 7: PDF is dead, long live PDF

Page 1: iText 7: PDF is dead, long live PDF

PDF made easy with iText 7

PDF is dead! Long live PDF!

Benoit Lagae, Developer, iText SoftwareBruno Lowagie, Chief Strategy Officer, iText Group

Page 2: iText 7: PDF is dead, long live PDF

Is PDF dead?

Page 3: iText 7: PDF is dead, long live PDF

PDF specifications

Page 4: iText 7: PDF is dead, long live PDF

Everybody uses HTML

Source:http://duff-johnson.com/2014/03/10/98-percent-of-dot-com-is-html-but-38-percent-of-dot-gov-is-pdf/

Page 5: iText 7: PDF is dead, long live PDF

But governments love PDF

Source:http://duff-johnson.com/2014/03/10/98-percent-of-dot-com-is-html-but-38-percent-of-dot-gov-is-pdf/

Percentage of PDF files:.org: 15%.gov: 38%.edu: 27%

Page 6: iText 7: PDF is dead, long live PDF

Publication versus …

• No need to be self-contained• May change over time• Not all content produced by the author

• e.g. Advertisements• Becoming more interactive

• e.g. Comments on a news article

Page 7: iText 7: PDF is dead, long live PDF

… Document

• Self-contained• Unchanging (non-dynamic)• Able to be authenticated• Able to be secured/protected

Page 8: iText 7: PDF is dead, long live PDF

Not counting HTML, PDF is king

Source:http://duff-johnson.com/2015/02/12/the-8-most-popular-document-formats-on-the-web-in-2015/

Page 9: iText 7: PDF is dead, long live PDF

Publication:HTML depends on context

Document:PDF is forever

Page 10: iText 7: PDF is dead, long live PDF

PDF/Eengineering

Since 2008 ISO 24517

PDF/VTprinting

Since 2010ISO 16612

PDF/Xgraphic arts

Since 2001ISO 15930

PDF/Aarchive

Since 2005ISO 19005

PDF/UAaccessibility

Since 2012ISO 14289

PDFPortable Document FormatFirst released by Adobe in 1993ISO Standard since 2008ISO 32000

Related: XFDF (ISO), EcmaScript (ISO), PRC (ISO), PAdES (ETSI), ZUGFeRD

An umbrella of standards

Page 11: iText 7: PDF is dead, long live PDF

iText 7: a PDF engine

Page 12: iText 7: PDF is dead, long live PDF

Image exampleImage fox = new Image(ImageFactory.getImage(FOX));Image dog = new Image(ImageFactory.getImage(DOG));Paragraph p = new Paragraph("The quick brown ").add(fox) .add(" jumps over the lazy ").add(dog);document.add(p);

Page 13: iText 7: PDF is dead, long live PDF

On the importance of making a document accessible

Page 14: iText 7: PDF is dead, long live PDF

Can everyone read this?

Page 15: iText 7: PDF is dead, long live PDF

Some structure is helpful

title

list item

list item

list item

Label Content

Page 16: iText 7: PDF is dead, long live PDF

Can everyone read this?

Page 17: iText 7: PDF is dead, long live PDF

How do we read a spider chart?

Ris

k M

anag

emen

t

Stru

ctur

ed F

inan

ce

Mer

gers

& a

cqui

sitio

ns

Gov

erna

nce

& In

tern

al

Con

trol

Acc

ount

ing

Ope

ratio

ns

Trea

sury

ope

ratio

ns

Man

agem

ent I

nfor

mat

ion

& B

usin

ess

Dec

isio

n Su

ppor

tB

usin

ess

Plan

ning

&

Stra

tegy

Fina

nce

Con

trib

utio

n to

IT

Man

agem

ent

Com

mer

cial

Act

iviti

es

Taxa

tion

Func

tiona

l Lea

ders

hip

Resolve abbreviations

What goes into

rows / columns?Make info color

independent

Page 18: iText 7: PDF is dead, long live PDF

Is this a better way to read data?

Page 19: iText 7: PDF is dead, long live PDF

Adapting the‘quick brown fox’

example for PDF/UA

Page 20: iText 7: PDF is dead, long live PDF

PDF/UA (part 1)PdfDocument pdf = new PdfDocument(new PdfWriter(dest));Document document = new Document(pdf);

//Setting some required parametersPdf.setTagged();pdf.getCatalog().setLang(new PdfString("en-US"));pdf.getCatalog().setViewerPreferences( new PdfViewerPreferences().setDisplayDocTitle(true));PdfDocumentInfo info = pdf.getDocumentInfo();info.setTitle("iText7 PDF/UA example");//Create XMP meta datapdf.createXmpMetadata();

Page 21: iText 7: PDF is dead, long live PDF

PDF/UA (part 2)//Fonts need to be embeddedPdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);Paragraph p = new Paragraph();p.setFont(font);p.add(new Text("The quick brown "));Image foxImage = new Image(ImageFactory.getImage(FOX));//PDF/UA: Set alt textfoxImage.getAccessibilityProperties().setAlternateDescription("Fox");p.add(foxImage);p.add(" jumps over the lazy ");Image dogImage = new Image(ImageFactory.getImage(DOG));//PDF/UA: Set alt textdogImage.getAccessibilityProperties().setAlternateDescription("Dog");p.add(dogImage);document.add(p);

document.close();

Page 22: iText 7: PDF is dead, long live PDF

Result

Page 23: iText 7: PDF is dead, long live PDF

On the importance of making a document archivable

Page 24: iText 7: PDF is dead, long live PDF

PDF/A• ISO-19005

– Long-term preservation of documents– Approved parts will never become invalid– Individual parts define new, useful features

• Obligations and restrictions– Metadata: ISO 16684, eXtensible Metadata Platform (XMP)– The document must be self-contained:

• All fonts need to be embedded• No external movie, sound or other binary files

– No JavaScript allowed– No encryption allowed

Page 25: iText 7: PDF is dead, long live PDF

Three standards• PDF/A-1 (2005)

– based on PDF 1.4– Level B (“basic”): visual appearance– Level A (“accessible”): visual appearance + structural and semantic properties (Tagged PDF)

• PDF/A-2 (2011)– Based on ISO-32000-1– Features introduced in PDF 1.5, 1.6, and 1.7:

• Added support for JPEG2000, Collections, object-level XMP, optional content• Improved support for transparency, comment types and annotations, digital signatures

– Level U (“unicode”): visual appearance + all text is in Unicode

• PDF/A-3 (2012)– Based on PDF/A-2 with only 1 difference: attachments do not need to be PDF/A

Page 26: iText 7: PDF is dead, long live PDF

Adapting the ‘quick brown fox’example for PDF/A

Page 27: iText 7: PDF is dead, long live PDF

PDF/A-1b examplePdfADocument pdf = new PdfADocument(new PdfWriter(dest), PdfAConformanceLevel.PDF_A_1B, new PdfOutputIntent("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", new FileInputStream(INTENT)));Document document = new Document(pdf);//Create XMP meta datapdf.createXmpMetadata();//Fonts need to be embeddedPdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);Paragraph p = new Paragraph();p.setFont(font);p.add(new Text("The quick brown "));Image foxImage = new Image(ImageFactory.getImage(FOX));p.add(foxImage);p.add(" jumps over the lazy ");Image dogImage = new Image(ImageFactory.getImage(DOG));p.add(dogImage);document.add(p);document.close();

Page 28: iText 7: PDF is dead, long live PDF

Resulting PDF/A-1b

Page 29: iText 7: PDF is dead, long live PDF

PDF/A-1a examplePdfADocument pdf = new PdfADocument(new PdfWriter(dest), PdfAConformanceLevel.PDF_A_1A, new PdfOutputIntent("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", new FileInputStream(INTENT)));Document document = new Document(pdf);pdf.setTagged();pdf.createXmpMetadata();PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);Paragraph p = new Paragraph();p.setFont(font);p.add(new Text("The quick brown "));Image foxImage = new Image(ImageFactory.getImage(FOX));foxImage.getAccessibilityProperties().setAlternateDescription("Fox");p.add(foxImage);p.add(" jumps over the lazy ");Image dogImage = new Image(ImageFactory.getImage(DOG));dogImage.getAccessibilityProperties().setAlternateDescription("Dog");p.add(dogImage);document.add(p);document.close();

Page 30: iText 7: PDF is dead, long live PDF

Resulting PDF/A-1a

Page 31: iText 7: PDF is dead, long live PDF

Real-world use:publishing a CSV file as PDF/A-3a

and PDF/UA

Page 32: iText 7: PDF is dead, long live PDF

United States database

Page 33: iText 7: PDF is dead, long live PDF

United States examplepart 1: initializations

PdfADocument pdf = new PdfADocument( new PdfWriter(dest), PdfAConformanceLevel.PDF_A_3A, new PdfOutputIntent("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", new FileInputStream(INTENT)));Document document = new Document(pdf, PageSize.A4.rotate());//Setting some required parameterspdf.setTagged(); // PDF/UA and PDF/A Level apdf.getCatalog().setLang(new PdfString("en-US")); // PDF/UA pdf.getCatalog().setViewerPreferences( // PDF/UA new PdfViewerPreferences().setDisplayDocTitle(true)); // PDF/UA PdfDocumentInfo info = pdf.getDocumentInfo(); // PDF/UA info.setTitle("iText7 PDF/A-3 example"); // PDF/UA //Create XMP meta datapdf.createXmpMetadata(); // PDF/UA and PDF/A Level a

Page 34: iText 7: PDF is dead, long live PDF

United States examplepart 2: add attachment

//Add attachmentPdfDictionary parameters = new PdfDictionary();parameters.put(PdfName.ModDate, new PdfDate().getPdfObject());PdfFileSpec fileSpec = PdfFileSpec.createEmbeddedFileSpec( pdf, Files.readAllBytes(Paths.get(DATA)), "united_states.csv", "united_states.csv", new PdfName("text/csv"), parameters, PdfName.Data, false);fileSpec.put(new PdfName("AFRelationship"), new PdfName("Data"));pdf.addFileAttachment("united_states.csv", fileSpec);PdfArray array = new PdfArray();array.add(fileSpec.getPdfObject().getIndirectReference());pdf.getCatalog().put(new PdfName("AF"), array);

Page 35: iText 7: PDF is dead, long live PDF

United States examplepart 3: parse CSV file

PdfFont font = PdfFontFactory.createFont(FONT, true);PdfFont bold = PdfFontFactory.createFont(BOLD_FONT, true);// Parsing a CSV file and add data to a tableTable table = new Table(new float[]{4, 1, 3, 4, 3, 3, 3, 3, 1});table.setWidthPercent(100);BufferedReader br = new BufferedReader(new FileReader(DATA));String line = br.readLine();process(table, line, bold, true);while ((line = br.readLine()) != null) { process(table, line, font, false);}br.close();document.add(table);document.close();

Page 36: iText 7: PDF is dead, long live PDF

United States examplepart 4: process each line

public void process(Table table, String line, PdfFont font, boolean isHeader) { StringTokenizer tokenizer = new StringTokenizer(line, ";"); while (tokenizer.hasMoreTokens()) { if (isHeader) { table.addHeaderCell( new Cell().add( new Paragraph(tokenizer.nextToken()).setFont(font))); } else { table.addCell( new Cell().add( new Paragraph(tokenizer.nextToken()).setFont(font))); } }}

Page 37: iText 7: PDF is dead, long live PDF

United States example: result

Page 38: iText 7: PDF is dead, long live PDF

United States example: result

Page 39: iText 7: PDF is dead, long live PDF

Real-world use: ZUGFeRD,the future of invoicing

Page 40: iText 7: PDF is dead, long live PDF

Invoices:Need to be archived

Page 41: iText 7: PDF is dead, long live PDF

Invoices:Need to be accessible

Page 42: iText 7: PDF is dead, long live PDF

Invoices:Need to be machine-readable

Page 43: iText 7: PDF is dead, long live PDF

Invoices:Need to be machine-readable

Page 44: iText 7: PDF is dead, long live PDF

iText 7 and its value add-ons

Page 45: iText 7: PDF is dead, long live PDF

New in iText 7:improved typography

and support for Indic scripts

Page 46: iText 7: PDF is dead, long live PDF

iText 5: missing links

Indic scripts:• Only unsupported major script family• Feature request #1• Huge opportunity

• limited support in most other PDF libraries

Other features:• Optional ligatures in Latin script• Vowel diacritics in Arabic

Page 47: iText 7: PDF is dead, long live PDF

Indic scripts: problems•Lack of expertise

• Unicode encodes 49 Indic scripts• Complex scripts with unique features

• Glyph repositioning: ह + ि� = हिह• Glyph substitution: ம + ு� = மு• Half-characters: त + �� + य = त्य

•Unsolvable issues for iText 5 font engine• No dedicated Unicode points for half-characters• No font lookups past ‘\uFFFF’• Ligaturization is context-dependent (virama)

Page 48: iText 7: PDF is dead, long live PDF

Indic scripts: solutions

Writing a new font engine• Automatic script recognition

• Based on Unicode ranges

• Flexibility = extensibility• Generic Shaper class • Separate module, only called when necessary

• Glyph replacement rules• Different per writing system• Alternate glyphs are font-dependent

Page 49: iText 7: PDF is dead, long live PDF

Indic scripts: examplesPdfFont font = PdfFontFactory.createFont(arial, PdfEncodings.IDENTITY_H, true);String txt = "\u0938\u093E\u0939\u093F\u0924\u094D\u092F\u0915\u093E\u0930"; // saahityakaardocument.add(new Paragraph(txt).setFont(font));

String txt = "\u0B8E\u0BB4\u0BC1\u0BA4\u0BCD\u0BA4\u0BBE\u0BB3\u0BB0\u0BCD"; // eluttaalardocument.add(new Paragraph(txt).setFont(font));

Page 50: iText 7: PDF is dead, long live PDF

Other scripts: examplesPdfFont font = PdfFontFactory.createFont(arial, PdfEncodings.IDENTITY_H, true);String txt = "\ u0627\u0644\u0643\u0627\u062A\u0628"; // al-katibudocument.add(new Paragraph(txt).setFont(font));

String txt = "writer"; GlyphLine glyphLine = font.createGlyphLine(txt);Shaper.applyLigaFeature(foglihtenNo07, glyphLine, null);canvas.showText(glyphLine)

Page 51: iText 7: PDF is dead, long live PDF

Status of advanced typography in iText 7

•Indic scripts• We already support:

• Devanagari• Tamil

• Coming soon:• Telugu• Others: based on customer demand

•Arabic• Support for vocalized Arabic (diacritics) is in development

•Latin• Optional ligatures are fully supported