Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

20
Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager

Transcript of Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

Page 1: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

Data cleansing for Dummies:Google to the rescue!!

Dave SmithPetrology Collections Manager

Page 2: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

Drag picture to placeholder or click icon to add

The Natural History Museum, London

Page 3: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

Architectural wonders

• Waterhouse building opened in 1881

• Steel frame and terracotta

• Purpose built for natural history collections

Page 4: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

• 1000 staff

• 350 science staff

• 72 million specimens (estimated)

• Life Sciences

– Plants, animals, birds, insects

• Earth Sciences

– Minerals & gems, rocks, fossils, meteorites

The Museum

Page 5: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

My role

• Geologist by training• Collections Manager for rock collections

– 125,000 rocks– 10,000 decorative stones– 37,000 ocean sediments– 16,000 ore specimens

• Departmental EMu administrator– Registry management– Report writing– Training & documentation– EMu support & upgrade testing– Communication

Page 6: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

‘Fingers in lots of pies’

• Have been involved in cross-museum initiatives involving EMu.

Page 7: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

Data cleansing for Dummies:Google to the rescue!!

Dave SmithPetrology Collections Manager

011100101001010101001010001000101111100001010100101001001000100101011101011001001001000101001010010101

Page 8: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

The problem

Page 9: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

Core Information

• 89,000 Records (73%)

– Identification = 52,100

– Provenance = 64,215

– Acquisition = 38,700

– Storage = 14,300

Page 10: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.
Page 11: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.
Page 12: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.
Page 13: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

Numbers

Register volume Acquisition records Specimen records

1-5 634 19,283

1-5 (supplementary) 501 (490) 1965 (1927)

1-5 (merged) 1124 21,210

6-11 1832 30,080

Geological Society 510 9,852

TOTAL 3466 63,107

Page 14: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

The Problem

• Data sits outside Emu – how to get it in?

• Not as easy as it sounds – many barriers…

• Notes field used for data with uncertain placeholder.

• Sites data of variable levels of atomisation depending on experience of digitiser.

Page 15: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

Acquisition Lot entry

Page 16: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.
Page 17: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

The Problem

• Data sits outside Emu – how to get it in?

• Not as easy as it sounds – many barriers…

• Notes field used for data with uncertain placeholder.

• Sites data of variable levels of atomisation depending on experience of digitiser.

• Approx. 95% of specimens have a record in EMu with a minimum of registration number. Once cleaned - How to update records without overwriting enhanced data

• Unfamiliarity with Access

• Short time periods for data cleansing.

Page 18: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

The Solution

• Google Refine

• Open Refine (Github)

• Personal web service

• Runs in your browser

Page 19: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

The demo

Page 20: Data cleansing for Dummies: Google to the rescue!! Dave Smith Petrology Collections Manager.

Benefits

• Intuitive User Interface

• Powerful editing / data manipulation functions

• Can’t make mistakes! Endless undo…..!

• Pick up where you left it Maintains history

• Link to open-data sources to validate your data

• Augment your data with free open data sources.