The PDF Is the Enemy (but It Doesn't Have to Be)
-
Upload
tmohabir -
Category
Data & Analytics
-
view
527 -
download
1
Transcript of The PDF Is the Enemy (but It Doesn't Have to Be)
What You’ll Learn Today
How to “deconstruct” PDFs and share them in ways that make them more open and usable
What the limitations PDFs present are, and why we need a shift in the way we share data
Ways that other nonprofits and foundations are sharing data in free, open, and repurposable ways
One-format-fits-all!
Limitations of the PDF
Takes…
.txt .csv
.xls .jpeg
.png .dbf
.doc .etc
And turns it into…
• Prevents data analysis • Some PDFs do not allow you to select text• Formatting limitations• Searchability• Inability to export charts/tables• Cannot aggregate across documents
Limitations of the PDF
Beyond the PDF: Releasing Data
• Provide a downloadable link to the data
• Develop a data portal for users• Create HTML/CSS tables that
link back to PDFs or original data
• For developers and technologists- release the data as APIs.
Unlocking PDFs: Innovations
• PDF Liberation Hackathon- list of PDF extraction resources and OCR technologies
• Smart Chicago’s Primer on unlocking PDFs using Tabula, OpenRefine and Google Fusion Tables
• OpenGov Foundation’s blog post on unlocking Congressional Financial Disclosure PDF forms
• Sunlight’s developers: What Word Where?
Contact Us
Gabi Fitz, Foundation [email protected]
Janet Camarena, Foundation [email protected]
Tristan Mohabir, The Communications [email protected]
Amy Ngai, Sunlight [email protected]
Gabriela Schneider, Sunlight [email protected]