The Importance of Standardization of the Data Format: A Case Study from the National Herbarium of...

Post on 29-Dec-2015

218 views 2 download

Tags:

Transcript of The Importance of Standardization of the Data Format: A Case Study from the National Herbarium of...

The Importance of Standardization of the Data Format:

A Case Study from the National Herbarium

of the Netherlands

Universiteit Leiden

Willemse, L.P.M., Mols, J.B., Welzen, P.C. van & Smets, E.F.

(willemse@nhn.leidenuniv.nl)

Standardization:

The process of ensuring that the electronic storage of (biodiversity) information is consistent within and between databases in such a way that information is similarly structured and the individual basic data elements have an identical format.

Data format:

The exact layout and text arrangement of the contents of a basic data-element including guidelines on how to deal with exceptional/ extraordinary instances.

Universiteit Leiden

NHN digitization

• start 1996

• project - permanent

• taxon – region - group

• 850,000 specimens (out of 5.5 million)

• off the shelf software

• three branches

Introduction

www.nationaalherbarium.nl/virtual/

Universiteit Leiden

Data structure• HISPID

Universiteit Leiden

Introduction

Standardization of:

Data

• plant distribution

• ISO country code

• author abbreviation

• collections

• periodicals

Protocol:

• description

• domain/value

• syntax

• guidelines

• annexes

Universiteit Leiden

Introduction

Software used at the NHN: BRAHMS

Data files with botanical specimen data handled between 1996 – 2006

• from the Leiden branch of the NHN

• from the other two branches of the NHN

• from institutes worldwide

• from herbaria participating in SEABCIN (KEP, SAN, SAR, BKF, SING, BO, PNH)

Universiteit Leiden

Data format

Universiteit Leiden

www.seabcin.org

Data format

Universiteit Leiden

Data format

Universiteit Leiden

Data format

The same

Different

Collector names:

www.nationaalherbarium.nl/fmcollectors/

Universiteit Leiden

Data format

Collector names:

• different notation

Universiteit Leiden

Data format

Collector

Willemse, L.P.M.

Willemse, L.

Willemse

Collector Addcoll

Mogea W.J.J.O. de Wilde

J.P.Mogea W.J.J.O. de Wilde

Mogea, J.P. Wilde, W.J.J.O. de

Mogea & W.J.J.O. de Wilde

Mogea; W.J.J.O. de Wilde

J.P.Mogea & W.J.J.O. de Wilde

J.Mogea & W. de Wilde

Universiteit Leiden

Data format

Collector names:• different notation

• more collectors

Addcoll

P.C. van Welzen

van Welzen, P.C.

Van Welzen, P.C.

Welzen, P.C. van

Universiteit Leiden

Data format

Collector names:• different notation

• more collectors

• prepositions

Collector names:• different notation

• more collectors • prepositions

• Titles/ranks & composition

title/ranks

composition

Sarifa Abu Bakar

Postar Jaiwit Miun

Yu Ming Ju

Ignatius Bernard

Duanis GuritamDr./Ir./Prof.

Hj. (Haji)

Mr./Mrs./M.

Père/Father

F.G./D.O.

Universiteit Leiden

Data format

Collector Addcoll Prefix Number

Anderson, J.A.R. S 27682

Anderson, J.A.R. S27682

S Anderson, J.A.R. 27682

Universiteit Leiden

Data format

Collector number:

insitute series

Collector number:• insitute series

• used more than once

Collector Addcoll Prefix Number Suffix

Clemens, J. Clemens, M.S. 30339

Clemens, J. Clemens, M.S. 30339 A!

Universiteit Leiden

Data format

Collector number:• insitute series• used more than once

• more series

Collector Prefix Number Suffix

Beccari, O. Musci 167

Beccari, O. 167

Universiteit Leiden

Data format

Collector number:• insitute series• used more than once• more series

• compound numbers

Prefix Number Suffix

920419-1/7

920419-1/ 7

920419- 1 /7

Universiteit Leiden

Data format

• Data from different sources are rarely completely consistent

• Differences in format are “difficult” to solve (in comparison with differences in structure)

• Differences interfere with data exchange

• Many factors cause these differences

• Standardization data format underrated

Universiteit Leiden

Conclusions

• more effort in promoting and findingmeans to enforce standards

• expand HISPID

• develop standard specimen labels

• standard for collector names

Universiteit Leiden

Recommendations