The PDF Is the Enemy (but It Doesn't Have to Be)

31
The PDF Is the Enemy (but It Doesn’t Have to Be)

Transcript of The PDF Is the Enemy (but It Doesn't Have to Be)

The PDF Is the Enemy (but It Doesn’t Have to Be)

What You’ll Learn Today

How to “deconstruct” PDFs and share them in ways that make them more open and usable

What the limitations PDFs present are, and why we need a shift in the way we share data

Ways that other nonprofits and foundations are sharing data in free, open, and repurposable ways

One-format-fits-all!

Limitations of the PDF

Takes…

.txt .csv

.xls .jpeg

.png .dbf

.doc .etc

.pdf

And turns it into…

• Prevents data analysis • Some PDFs do not allow you to select text• Formatting limitations• Searchability• Inability to export charts/tables• Cannot aggregate across documents

Limitations of the PDF

Beyond the PDF: Releasing Data

• Provide a downloadable link to the data

• Develop a data portal for users• Create HTML/CSS tables that

link back to PDFs or original data

• For developers and technologists- release the data as APIs.

Beyond the PDF: Extracted Tables & Images

Beyond the PDF: Extracted Tables & Images

Beyond the PDF: Extracted Tables & Images

Beyond the PDF: Extracted Tables & Images

Beyond the PDF: Extracted Tables & Images

Beyond the PDF: Extracted Tables & Images

Beyond the PDF: Open Licensing

Beyond the PDF: Open Licensing

Beyond the PDF: Open Licensing

Beyond the PDF: Open Repositories and Commons

Beyond the PDF: Open Repositories and Commons

Beyond the PDF: Open Repositories and Commons

Deconstruct the PDF

Deconstruct the PDF

Deconstruct the PDF

Deconstruct the PDF

Deconstruct the PDF

Deconstruct the PDF

Deconstruct the PDF

Unlocking PDFs: Innovations

• PDF Liberation Hackathon- list of PDF extraction resources and OCR technologies

• Smart Chicago’s Primer on unlocking PDFs using Tabula, OpenRefine and Google Fusion Tables

• OpenGov Foundation’s blog post on unlocking Congressional Financial Disclosure PDF forms

• Sunlight’s developers: What Word Where?

Contact Us

Gabi Fitz, Foundation [email protected]

Janet Camarena, Foundation [email protected]

Tristan Mohabir, The Communications [email protected]

Amy Ngai, Sunlight [email protected]

Gabriela Schneider, Sunlight [email protected]