Giddens ecn2013
-
Upload
ecnofficer -
Category
Technology
-
view
229 -
download
3
description
Transcript of Giddens ecn2013
Getting collection data, maps, and images online via open source and commercial
solutions
Michael Giddens
Software developer with a focus in biodiversity informatics.
Follow me @silverbiology
What we do
• Design workflows and software to optimize image capture
• Analyze & Process label images• Create portals for entomological and scientific
collections• Develop interactive maps to tell stories about
data• Provide support and technical advice for NSF
projects
Digitization & Data Capture
• Seconds count• 28,800 seconds in an 8 work day• 100k @ 30 seconds = 34.7 days• 100k @ 29 seconds = 33.5 days• …• 100k @ 15 seconds = 17.3 days• Humans are not robots
Solutions
• Look at every action as a micro task• Find tasks to fill any wait time• Stick to a single workflow• Filename conventions are important• Stick with image sizes and formats needed• Renaming filenames using scanners or data
entrye.g. SilverImage
• Backup Images!!!
Things we learned
• Make sure your lighting environment does not change
• Dragon dictation is not accurate enough for number or scientific words
• Manually renaming files is slow• Some student workers do not care as much as
you do about your collection• People get burned out
Data Processing
• Optical Character Recognition Engines• Machine Learning• Crowd Sourcing• Human in the Middle
Optical Character Recognition Engines
• Free– Tesseract
• Commercial– OmniPage– Abbyy
• Services– www.silverbiology.com
• Font Training• No handwriting solution on market
Machine Learning
• Data Dictionaries• Conditional logic / Decision Trees• Past data to predict future data• Label / Word Boundaries• Orientation
Crowd SourcingNotes From Nature
• http://www.notesfromnature.org
Calbug – Essig Museum Collections
Ornithologicalfrom Natural History Museum
ALA Volunteer Program
• http://volunteer.ala.org.au
Human In The Middle
• Rotating Images• Tagging Areas• Metadata tagging• Identifying False Positives• Verification Steps• Bulk Validation
Web Portals• In-House• Specify 6 Portal• Symbiota• SilverCollection– California Academy of Sciences– Angelo State Natural History Museum– Louisiana State Arthropod Museum– Kansas State University Entomology Dept.– Mississippi Entomological Museum– NLBIF
Explore / Browse
• Taxonomy• Taxonomy (Filtered)• Family• Genus• Type Status• Regions• Collectors• Custom
Custom Checklists
Spreadsheet Format
Collecting Events
Images
Specimen Details
Reports
Interactive Maps
• Online service to Map, Analyze and Build applications with your data
• Simple to use• Easily create distribution maps, heat maps,
and category maps• Access to full geospatial query engine• Visualizing ecological models• Works well with lots of data
Visualizing two months in the life of seagull Eric
Blog on Lifewatch by Peter Desmet
Interactive Occurrence Data
Interactive Map Modes
Density Maps Polygons Grids
Useful Tools Provided By the Global Biodiversity Information Facility
http://tools.gbif.org
• Darwin Core Archive Assistant• Darwin Core Archive Validator• Higher Taxonomy Services• Name Finder• Name Parser• GBIF API Services• Integrated Publishing Toolkit (IPT)
Global Names Architecturehttp://www.gloablnames.org
• Global Names Recognition and Discovery• Global Names Index
Questions?
Michael Giddens www.silverbiology.com