Post on 06-Aug-2015
What You’ll Learn Today
How to “deconstruct” PDFs and share them in ways that make them more open and usable
What the limitations PDFs present are, and why we need a shift in the way we share data
Ways that other nonprofits and foundations are sharing data in free, open, and repurposable ways
One-format-fits-all!
Limitations of the PDF
Takes…
.txt .csv
.xls .jpeg
.png .dbf
.doc .etc
And turns it into…
• Prevents data analysis • Some PDFs do not allow you to select text• Formatting limitations• Searchability• Inability to export charts/tables• Cannot aggregate across documents
Limitations of the PDF
Beyond the PDF: Releasing Data
• Provide a downloadable link to the data
• Develop a data portal for users• Create HTML/CSS tables that
link back to PDFs or original data
• For developers and technologists- release the data as APIs.
Unlocking PDFs: Innovations
• PDF Liberation Hackathon- list of PDF extraction resources and OCR technologies
• Smart Chicago’s Primer on unlocking PDFs using Tabula, OpenRefine and Google Fusion Tables
• OpenGov Foundation’s blog post on unlocking Congressional Financial Disclosure PDF forms
• Sunlight’s developers: What Word Where?
Contact Us
Gabi Fitz, Foundation Centergvf@foundationcenter.org
Janet Camarena, Foundation Centerjfc@foundationcenter.org
Tristan Mohabir, The Communications Networktmohabir@comnetwork.org
Amy Ngai, Sunlight Foundationangai@sunlightfoundation.com
Gabriela Schneider, Sunlight Foundationgschneider@sunlightfoundation.com