Standardizing Mansfeld's World Database of Agricultural and Horticultural Crops by Implementing a...
-
Upload
dorthy-robbins -
Category
Documents
-
view
213 -
download
0
Transcript of Standardizing Mansfeld's World Database of Agricultural and Horticultural Crops by Implementing a...
Standardizing Mansfeld's World Database of Agricultural and Horticultural Crops by Implementing a Concept-Based Data ModelRam Narang and Helmut Knüpffer
Leibniz Institute of Plant Genetics and Crop Plant Research, D-06466 Gatersleben, [email protected]
Introduction
The integration of species-related information from multiple sources in federated information systems or web portals faces the problem of different taxonomic approaches used. Many global and local taxonomic databases, among them ITIS and Species2000, provide information about species, based on a single taxonomic view, where information is attached to a single accepted (or preferred) name. Taxonomic opinions and standards vary with time, place, and investigator, and depend upon many factors like geographical range of study, interpretation of collected specimens, the fossil record, morphology, genetics and molecular phylogeny. New classifications may arise from more detailed studies of specimens, the discovery of new taxonomic information, or the description of new species and groupings. Consequently, biological taxa often have multiple names, which in turn may have been applied to multiple taxon concepts. When combining such data from diverse sources into a single database or portal, one needs to reconcile those different standards. In addition, the increasing use of DNA sequence comparison as a tool to analyse phylogenetic relationships is accelerating the rate of taxonomic revision, which is thus unlikely to stabilize in the foreseeable future. Therefore, the availability and implementation of a data model representing multiple, alternative taxonomic views is crucial for a sound taxonomic information management.
Leibniz Institute of Plant Genetics and Crop Plant Research
The Berlin Data Model for Taxonomic Information
Mansfeld’s World Database of Agricultural and Horticultural Crops
Taxonomy Module of the Mansfeld Database
Implementing the Multiple Taxonomic Concepts Model
References
Various data models have been developed to support the representation of multiple, alternative taxonomic views in taxonomic databases (cf. Kennedy et al. 2006), among them the Berlin Model (Berendsohn et al. 2003), based on the IOPI model. The Berlin Model allows to use alternative taxonomic concepts (potential taxa) for species information. A number of projects, such as the Euro+Med PlantBase, AlgaTerra, MoReTax, the IOPI Global Plant Checklist, the Dendroflora of El Salvador and Med-Checklist, implemented the core of the Berlin Model as a taxonomic backbone for their databases and contributed to its continuous development and optimization (http://www.bgbm.org/biodivinf/docs/bgbm-model/). In addition, the Berlin Model is the underlying model of several tools dedicated to taxonomic data management such as taxonomic revisions, data import from external sources, data integrity checking and data publishing on the World Wide Web.The Core of the Berlin Model contains four central functional sections: (1) Taxon Names, (2) Potential Taxon (taxonomic concepts), (3) Facts and (4) References. Taxon names are the botanical names according to the International Code of Botanical Nomenclature (ICBN).
Like many other global taxonomic checklists, the Mansfeld Database represents a single taxonomic view of nomenclatural information. It incorporates classifications that have gained broad acceptance in taxonomic literature and by taxonomists working with the taxa concerned, and thus offers the opportunity of standardizing scientific nomenclature and taxonomy for cultivated plant species. Alternative taxonomic views (reflected by phrases such as sensu, amend., etc.) are presently stored as part of the nomenclatural reference. Similarly, authors and bibliographical references are not yet atomized into individual attributes. These information items need to be parsed and abstracted into the entity-relationship model to allow a conceptual view on the taxon.
The Mansfeld Database (http://mansfeld.ipk-gatersleben.de) is an online database developed at IPK since 1998, initially as a contribution to the project “Federal Information System on Genetic Resources” (BIG, http://www.big-flora.de/). It reflects the contents of “Mansfeld’s Encyclopedia of Agricultural and Horticultural Crops” (Hanelt and IPK 2001) and contains information on ca. 6,100 crop plant species, excluding forestry and ornamental plants. Each species entry provides nomenclature and synonymy, common names in different languages, the distribution of the species in the wild and regions of cultivation, uses, images, references, but also the ancestral species and notes on the phylogeny, variation and history. Originally developed under Microsoft Visual FoxPro, the Mansfeld Database has recently been migrated to the database platform Oracle 10g, and the procedures for the web interface were re-programmed.
In a first step of implementation, the latest version of the Berlin Core Model, a database model under MS SQL Server, was migrated into Oracle 10g. All database procedures, functions and triggers that implement taxonomic logic, were translated into their PL/SQL equivalents.
Nomenclatural and bibliographical data of the Mansfeld Database was atomised using JAVA programmes. The parsed information was tagged and stored in an XML file. The resulting soft-schema XML-file was read with JDOM and corrected manually -- a time-consuming task --, to write a strict schema XML file which was used to populate the tables in the Taxon, Reference and Potential Taxon sections of the Berlin taxonomic model. After completion of the taxonomic core, the remaining information from the Mansfeld Database, such as textual information on geographical distribution and uses, was linked to the potential taxon as factual data. Finally, the web interface was adapted (re-programmed) to the new data model.
NameTaxon Concept Reference
Facts
RelationThe combination of such a name with a reference forms a taxonym (or potential taxon, taxon concept). An auxiliary section Authors assembles author teams for the nomenclatural references. Finally, the fact component can be used to store any kind of factual information.
Basic data integrity rules in the Berlin Model are implemented at the level of tables, keys, and relations within the database model. For example, the rule that every botanical name should have a rank can be assured with a foreign key to the table defining the list of valid ranks. More complex rules and functions, e.g. to construct syntactically correct botanical names, are implemented using stored procedures and trigger functions. Triggers are functions executed automatically when certain database events occur. For example, one of the triggers automatically rebuilds an author team when one of its author names was changed.
vnam_tax
anzeigen
I11 idFK1,I19 taxon_idFK2,I20 vnam_idI22 vnam_neuI21 vnam_id_altI13 name_origI18 sprach_idFK3,I14 namtyp_idI15 pfl_teilI10 geogr_info add_infoI2 artikelI7 fuer_bigI17 soi_idI16 ref_idI6 erstelltI5 erst_vonI9 geaendertI8 geaend_vonI3 chkI4 chk_vonI12 löschen
volksnam
PK vnam_id
U1 nameI8,I5 name_ansiI7 soi_idI6 ref_idI1 anzeigenI3 erstellt erst_von geaendert geaend_von bemerkung original_nI2 chk chk_vonI4 löschen
taxa_soi
PK id
FK1,I6 taxon_idFK2,I5 soi_idI2 erstelltI1 erst_vonI4 geaendertI3 geaend_von
autoren
autor_orig
I6 autor_idI8 dubl_mitI5 autor_gesI3 autor_apnI4 autor_basI14 problemI1 autorI7 autor_nonI2 autor_api bemerkungI10 erstelltI9 erst_vonI12 geaendertI11 geaend_vonI13 löschen
dubl_botnam
botnam_id dublette dubl_mit taxon_id
soi
PK soi_id
soi_big name_d name_e
taxrang
PK rang_id
I4 rangI3 mf_rang_kuerzI2 mfI1 anzeigen sprach_idI6 taxlevel kulturpflanzeI5 reihenf erstellt erst_von geaendert geaend_von
pp_stat
PK ppstat_id
U1 pp_kuerzel bemerkung
gruppe
PK gruppen_id
I2 kuerzelI3 nameI1 anzeigen
taxa
PK taxon_id
FK1,I2 botnam_id bnam_id_altI4 hightax_idI3 famtax_id familieI1 artikel_id highart_id db_idFK2,I6 soi_idI5 ref_id erstellt erst_von geaendert geaend_von löschen
syn_stat
PK synstat_id
I4 syn_symbolI3 syn_statusI2 status_bigI5 textI1 sortierung bemerkung
publikat
publ_bphtl
I2 publ_id publ bemerkung erstellt erst_von geaendert geaend_von
vnamtyp
PK namtyp_id
I2 name_dI3 name_e bemerkungI1 anzeigen
botnam
PK botnam_id
I15 homonymI11 dubletteI10 dubl_mitI18 löschenFK3,I31 soi_idI22,I20,I32 nameI21 name_ansiI23 name_gzI5 autor_basI3 autorI9 autor_nonI8 autor_idI7,I22 autor_gesI16,I22 jahrI17 jahr_non publ_idI24,I22,I28 publ publ_band publ_seite publ_non publ_addI19 nam_stat alt_name name_vollI4 autor_apnI6 autor_chkI25 publ_bphI27 publ_tl2I26 publ_chkI30 ref_id tax_textFK1,I29 rang_idFK2,I14 gruppen_idI2 art_autor original bemerkungI1 anzeigenI13 fuer_bigI12 erstellt erst_von geaendert geaend_von
syno
PK id
FK1,I12 taxon_idI14 vtaxon_idI1 akztax_idI7 mf_artikelI13 text_taxFK2,I11 synstat_idI9 syn_operFK3,I8 ppstat_idI10 syn_text artikel_idI3 erstellt erst_vonI5 geaendert geaend_vonI2 anzeigenI4 fuer_bigI6 löschen bemerkung
The implementation of the Berlin Model in the Mansfeld Database facilitates standardisation and improves the quality of the taxonomic information by increasing accuracy, resolution and interpretability. In addition, existing standard taxonomy management tools such as a web editors can be adapted to be used on the underlying new conceptual Mansfeld Database model for updating the contents of the database. Vast information about 6,100 species of agricultural and horticultural crop plants will thus become more easily accessible to global portals on biodiversity information.
Outlook
Conceptual Db modelMansfeld
Database
XMLsoft schema
I XMLstrict schema
II III
Web screenshots of the Mansfeld Database before the transformation to the Berlin Model
Mansfeld Database – Taxonomy module
Entity-relationship model of the potential taxon
Concept-oriented database core
Implementation steps
TaxonRank
TaxonName
Potential Taxon Name
cm
cm
cm
cm
m
cm
m
1 1
1
1c
c
is acceptedname
assignsacceptedname
is higher taxonin classification
gives status and other taxonomicinformation of
is classified
1
ReferenceStatus
Assignment
AssignedStatus
ReferenceTitle
The Encyclopedia of Life (http://www.eol.org) launched in 2007 is developing “species pages” for all known organisms, the contents to be provided and edited by experts from all over the world, using a wiki-like editor. Its initial contents is being gathered from existing web resources. The rich information contents of >6,000 of the economically most important plant species documented in the Mansfeld Database was offered for inclusion at the EoL Plant Species Pages Meeting (St. Louis, Missouri), 31.10.-2.11.2007.
The Global Biodiversity Information Facility (http://www.gbif.org) is aiming at providing free access to biodiversity information on the web, using standardised web services. The Mansfeld Database developers have been approached by GBIF to make its ca. 38,000 common names of crop plant species in many languages available to GBIF, to start developing an interface that would allow the world’s biodiversity data to be queried also via common names, besides scientific names. Integrating the Mansfeld Database fully into GBIF would also make its rich crop species information accessible along with data from other providers of taxon-related data.
Berendsohn, W.G., M. Döring, M. Geoffroy, K. Glück, A. Güntsch, A. Hahn, W.-H. Kusber, J.L. Li, D. Röpert and F. Specht. 2003. The Berlin Model: a concept-based taxonomic information model. Pp. 15-26 in Berendsohn, W.G. (ed), MoReTax. Handling Factual Information Linked to Taxonomic Concepts in Biology. Schriftenreihe für Vegetationskunde 39, Bonn.
Hanelt, P. and Institute of Plant Genetics and Crop Plant Research (eds), 2001. Mansfeld’s Encyclopedia of Agricultural and Horticultural Crops (Except Ornamentals). 6 vols. 1st Engl. ed. Springer, Berlin, Heidelberg, New York, etc. (LXX+3645 pp.)
Kennedy, J., R. Hyam, R. Kukla and T. Paterson, 2006. Standard data model representation for taxonomic information. OMICS. A Journal of Integrative Biology 10 (Special Issue on Data Standards), 220-230.