What You’ll Learn Today
How to “deconstruct” PDFs and share them in ways that make them more open and usable
What the limitations PDFs present are, and why we need a shift in the way we share data
Ways that other nonprofits and foundations are sharing data in free, open, and repurposable ways
One-format-fits-all!
Limitations of the PDF
Takes…
.txt .csv
.xls .jpeg
.png .dbf
.doc .etc
And turns it into…
• Prevents data analysis • Some PDFs do not allow you to select text• Formatting limitations• Searchability• Inability to export charts/tables• Cannot aggregate across documents
Limitations of the PDF
Beyond the PDF: Releasing Data
• Provide a downloadable link to the data
• Develop a data portal for users• Create HTML/CSS tables that
link back to PDFs or original data
• For developers and technologists- release the data as APIs.
Unlocking PDFs: Innovations
• PDF Liberation Hackathon- list of PDF extraction resources and OCR technologies
• Smart Chicago’s Primer on unlocking PDFs using Tabula, OpenRefine and Google Fusion Tables
• OpenGov Foundation’s blog post on unlocking Congressional Financial Disclosure PDF forms
• Sunlight’s developers: What Word Where?
Contact Us
Gabi Fitz, Foundation [email protected]
Janet Camarena, Foundation [email protected]
Tristan Mohabir, The Communications [email protected]
Amy Ngai, Sunlight [email protected]
Gabriela Schneider, Sunlight [email protected]
Top Related