Tagging vs. Controlled Vocabulary: Which is More Helpful ...
Controlled Vocabulary Working Group - 2013 PRESENTED BY JOHN PORTER.
-
Upload
rodney-warren -
Category
Documents
-
view
215 -
download
0
Transcript of Controlled Vocabulary Working Group - 2013 PRESENTED BY JOHN PORTER.
Goal
Make it easy for researchers to find the data they need from LTER repositories byEnhancing searches through the use of a
thesaurus that provides synonyms, narrower terms and related terms
Creating a browseable structure for locating datasets
2013 Goals
Enhance term list to incorporate: New terms suggested by sites
Frequently searched terms
Frequently used terms
Terms related to human activities (social science)
More synonyms for existing terms that are found in LTER Metadata
Needed: Establish clear criteria for evaluating candidate terms Best Practices
Goals
Add definitions for terms in the Controlled Vocabulary
Create plans for dealing with taxonomic names and places that are currently not part of the existing Controlled Vocabulary
Workshop – May 2013
Pre-Workshop Queried LTER Sites for new candidate terms – Melendez,
Henshaw, Vanderbilt
Queried existing documents for words not currently in the Controlled Vocabulary – Gastil-Buhl
Queried logs for search terms used by Metacat users - Costa
Updated Tematres software to the latest version - Porter
Identified online sources for definitions – O’Brien, Vanderbilt
Investigated taxonomic web services and gazetteers – Gries Note: the group favors using Taxonomic and Geographic
Coverage elements rather than keywords for these elements
Workshop Participants 2013
LTER Information ManagersMargaret O’Brien, Kristen Vanderbilt, Donald
Henshaw and John Porter
Professional Librarians from UVA: Sherry Lake and Ivey Glendon
Added a lot to our discussions“about” vs. “contains” taxonomies
our focus is describing what datasets contain“about” is much harder to define for data
Workshop Results 2013
New Terms ~ 230 terms were suggested by 4 sites
~ 75 terms were accepted and added to LTER Vocabulary
Reason for rejection was given for each term not added
~ 25 additional terms were added based on use at 3 or more LTER Sites or 2 or more sites with > 10 datasets
~ Several suggested terms were added as non-preferred (UF) terms
Definitions 309 new definitions added
Controlled Vocabulary Status
710 total preferred terms200 synonyms (“use for” terms)363 total definitions
Important Workshop Activities - 2013
Developed improved Best Practices for identifying additional terms for inclusion (http://im.lternet.edu/VocabBestPractices)Including a table that lays out grounds for
rejecting particular words
What Rationale Do’s ProblemAbbreviation
Keywords should be applied to a number of datasets across the LTER Network.
Data discovery is the goal, so keywords that find data are most useful.
Propose keywords that are used at several other sites, and numerous datasets
NR - not repeated in multiple datasets
Keywords should be used at more than one site
A goal is to enable cross-site searching
Propose keywords that are used at several other sites
A - absent from other sites
Avoid proposing stand-alone adjectives
Stand alone adjectives imply an “of what” question. Such as “aboveground” raises the question “aboveground what?”
Propose nouns or possibly verbs, but not stand-alone adjectives. Perferred terms can include an adjective with an object (e.g., aboveground biomass)
ADJ - stand-alone adjective
Be specific Vague or ill-defined terms are hard to consistently assign
Use specific, unambiguous and well-defined terms
V - Vague
Avoid duplicating concepts already in the Controlled Vocabulary
Duplicative keywords lead to inconsistent keyword assignments
Avoid duplication of nearly-equivalent terms
AWE - adequate alternative word exists
Keywords should be well-defined Without definition and context some technical terms may be difficult to assess or place
Provide good definitions NC - needs clarification or better definition
Proposed synonyms should have exact correspondence to the preferred term
Synonyms should not refer to different concepts than the associated preferred term
Select synonyms that are exact matches for the concept described by the preferred term
NS - not a synonym
Keywords should be terms that users frequently search on
Keywords that are not searched for by users are not particularly useful.
Propose keywords that are frequently used in searches
NU - not used for search
Vision
Refining the “Vision” for how the controlled vocabulary can be used to make PASTA and other NIS elements more effective And link to other efforts such as DataOne, LODE and
EnvThes
Optional workshop yesterday – tasks identified: Identify systems and software tools that
effectively exploit controlled vocabularies for searching/browsing and ranking
Metrics tools: help identify specific datasets that could benefit from additional keywords