Giddens ecn2013

29
Getting collection data, maps, and images online via open source and commercial solutions Michael Giddens

description

 

Transcript of Giddens ecn2013

Page 1: Giddens ecn2013

Getting collection data, maps, and images online via open source and commercial

solutions

Michael Giddens

Page 2: Giddens ecn2013

Software developer with a focus in biodiversity informatics.

Follow me @silverbiology

Page 3: Giddens ecn2013

What we do

• Design workflows and software to optimize image capture

• Analyze & Process label images• Create portals for entomological and scientific

collections• Develop interactive maps to tell stories about

data• Provide support and technical advice for NSF

projects

Page 4: Giddens ecn2013

Digitization & Data Capture

• Seconds count• 28,800 seconds in an 8 work day• 100k @ 30 seconds = 34.7 days• 100k @ 29 seconds = 33.5 days• …• 100k @ 15 seconds = 17.3 days• Humans are not robots

Page 5: Giddens ecn2013

Solutions

• Look at every action as a micro task• Find tasks to fill any wait time• Stick to a single workflow• Filename conventions are important• Stick with image sizes and formats needed• Renaming filenames using scanners or data

entrye.g. SilverImage

• Backup Images!!!

Page 6: Giddens ecn2013

Things we learned

• Make sure your lighting environment does not change

• Dragon dictation is not accurate enough for number or scientific words

• Manually renaming files is slow• Some student workers do not care as much as

you do about your collection• People get burned out

Page 7: Giddens ecn2013

Data Processing

• Optical Character Recognition Engines• Machine Learning• Crowd Sourcing• Human in the Middle

Page 8: Giddens ecn2013

Optical Character Recognition Engines

• Free– Tesseract

• Commercial– OmniPage– Abbyy

• Services– www.silverbiology.com

• Font Training• No handwriting solution on market

Page 9: Giddens ecn2013

Machine Learning

• Data Dictionaries• Conditional logic / Decision Trees• Past data to predict future data• Label / Word Boundaries• Orientation

Page 10: Giddens ecn2013

Crowd SourcingNotes From Nature

• http://www.notesfromnature.org

Calbug – Essig Museum Collections

Ornithologicalfrom Natural History Museum

Page 11: Giddens ecn2013

ALA Volunteer Program

• http://volunteer.ala.org.au

Page 12: Giddens ecn2013

Human In The Middle

• Rotating Images• Tagging Areas• Metadata tagging• Identifying False Positives• Verification Steps• Bulk Validation

Page 13: Giddens ecn2013

Web Portals• In-House• Specify 6 Portal• Symbiota• SilverCollection– California Academy of Sciences– Angelo State Natural History Museum– Louisiana State Arthropod Museum– Kansas State University Entomology Dept.– Mississippi Entomological Museum– NLBIF

Page 14: Giddens ecn2013

Explore / Browse

• Taxonomy• Taxonomy (Filtered)• Family• Genus• Type Status• Regions• Collectors• Custom

Page 15: Giddens ecn2013

Custom Checklists

Page 16: Giddens ecn2013

Spreadsheet Format

Page 17: Giddens ecn2013

Collecting Events

Page 18: Giddens ecn2013

Images

Page 19: Giddens ecn2013

Specimen Details

Page 20: Giddens ecn2013

Reports

Page 21: Giddens ecn2013

Interactive Maps

Page 22: Giddens ecn2013

• Online service to Map, Analyze and Build applications with your data

• Simple to use• Easily create distribution maps, heat maps,

and category maps• Access to full geospatial query engine• Visualizing ecological models• Works well with lots of data

Page 23: Giddens ecn2013

GBIF - 350 Million Records

http://www.gbif.org/occurrence

Page 24: Giddens ecn2013

Visualizing two months in the life of seagull Eric

Blog on Lifewatch by Peter Desmet

Page 25: Giddens ecn2013

Interactive Occurrence Data

Page 26: Giddens ecn2013

Interactive Map Modes

Density Maps Polygons Grids

Page 27: Giddens ecn2013

Useful Tools Provided By the Global Biodiversity Information Facility

http://tools.gbif.org

• Darwin Core Archive Assistant• Darwin Core Archive Validator• Higher Taxonomy Services• Name Finder• Name Parser• GBIF API Services• Integrated Publishing Toolkit (IPT)

Page 28: Giddens ecn2013

Global Names Architecturehttp://www.gloablnames.org

• Global Names Recognition and Discovery• Global Names Index

Page 29: Giddens ecn2013

Questions?

Michael Giddens www.silverbiology.com