Price2 ecn2013

Post on 10-May-2015

263 views 3 download

Tags:

Transcript of Price2 ecn2013

Rapid, industrial scale

digitization of the NHM

microscope slide collection

Ben Price & Vladimir Blagoderov

Outline

• The NHM slide collection

• What is Digitization?

• The NHM workflow

• Psyllid collection

• Future prospects

The NHM slide collection

• ~ 2 million slides (60 : 40 vertical : horizontal storage)

The NHM slide collection

• Mix of slide sizes, mounts, storage cabinets

What is Digitization?

?

What is Digitization?

Label data:

– Quick to image

• 5000 per day

– Slow to transcribe (crowdsourcing)

– Slow to georeference (crowdsourcing)

What is Digitization?

Specimen:

– Slow to image

• 100,000 per year

– Data storage

• GB images

– Image delivery

• Proprietery software

– Do we need ALL specimens?

The NHM workflow*

PreparationHandling Imaging Post ProcessingData Capture

* Work in progress

Preparation Handling Imaging Post Processing Data Capture

• Datamatrix Labels (4.5mm)

• Processing Scripts (GIMP, Barcodefiler)

• Computing Facilities (64bit, 16GB RAM)

• Storage & Retrieval (Ke-EMu)

– What is a slide?

• Delivery (NHM data portal)

Preparation Handling Imaging Post Processing Data Capture

• Horizontal vs Vertical storage

• Card Slide covers!

• Labelling & Handling = up to 90% of the time

Preparation Handling Imaging Post Processing Data Capture

• Scanner – SLR – Mamiya Leaf – SatScanner

• Balance slides per image vs label resolution (PPI)

• Single slide imaging?

Preparation Handling Imaging Post Processing Data Capture

Horizontal Storage:

• Less handling

– Tray fits A3 scanner / SLR

• Can be autocropped

Preparation Handling Imaging Post Processing Data Capture

Horizontal Storage:

• Less handling

– Tray fits A3 scanner / SLR

• Manual cropping

– Crowd cropping?

Preparation Handling Imaging Post Processing Data Capture

Vertical storage:

• Single type of template (post processing)

• High contrast (scripts)

• Cheap (foam, card)

• More Handling

• Autocropping

Preparation Handling Imaging Post Processing Data Capture

• Resolution tests (PPI)

– Canon 650D (18MP sensor) + 50mm Macro

300 450 600250

Slides

PPI

45 18 1072

Preparation Handling Imaging Post Processing Data Capture

• Resolution tests (PPI)

– Mamiya Leaf (80MP sensor) + 80mm lens

Slides

PPI 450

72

300

180

600

50

Preparation Handling Imaging Post Processing Data Capture

• Resolution tests (PPI)

– HerbScanner (EPSON A3 size)

Slides

PPI 450

50

300

50

600

50

Preparation Handling Imaging Post Processing Data Capture

• Resolution tests (PPI)

– SatScanner (0.16x lens, low resolution ~1000 PPI)

72 - 100Slides

Preparation Handling Imaging Post Processing Data Capture

Preparation Handling Imaging Post Processing Data Capture

Preparation Handling Imaging Post Processing Data Capture

Preparation Handling Imaging Post Processing Data Capture

Progress to date

• Psyllidae slide collection (4000 slides)

• Two digitizers + SatScanner = 4 days

• Handling (not Imaging) is the bottleneck

• Solutions:

– More digitizers

– Crowd cropping of tray scans?

Progress to date

• Theoretical maximum

– SatScan: 7000 slides per day (5-8 people)

– Other: 700 - 1000 slides per person per day

• NHM Entom collection = 10 – 15 person years

unloadimagelabel load

imagelabel load

unloadimage label load

unloadimage labelload

unloadimagelabel load

label load23

4

1

label

Future Plans

• Specimen Imaging

– Type material

Acknowledgments

Flavia

Johanna

Elisa

Peter

LyndseySara

Questions?