SKOS-2-HIVE UNT workshop. Morning Session Schedule Introductions and Exploring HIVE Section 1:...
-
Upload
emilie-popp -
Category
Documents
-
view
216 -
download
0
Transcript of SKOS-2-HIVE UNT workshop. Morning Session Schedule Introductions and Exploring HIVE Section 1:...
SKOS-2-HIVEUNT workshop
Morning Session Morning Session ScheduleSchedule
Introductions and Exploring HIVE
Section 1: Knowledge Organization and Vocabulary Control
Section 2: From Thesauri to SKOS
BREAK
Section 3: From SKOS to HIVE
Section 4: Evaluating HIVE
IntroductionsHollie White [email protected]
Exploring HIVEhttp://hive.nescent.org
Section 1: Knowledge Organization
andVocabulary Control
Classical view of ILS languages
<___|____|_______|______|_____|______|______|_______|________|_____>
Simple thesauri/ deeper taxonomies low level full/intricate
Key word CV thesauri ontologies ontologies
Lists (WordNet) (OWL)
Greenberg’s Ontology Continuum
Types of Types of Vocabulary ControlVocabulary Control
From least to most structure
Term listsControlled but semi-unstructured list
Example: ASU portal-- http://library.lib.asu.edu/search/y
Controlled Vocabulary
Less structured thesauri also referred to as subject heading lists
Example: MeSH -- http://www.nlm.nih.gov/mesh/MBrowser.html
Thesauri
Composed of indexing terms/descriptors
Example: NASA -- http://www.sti.nasa.gov/thesfrm1.htm
Types of Vocabulary Types of Vocabulary Control continuedControl continued
Taxonomy
A subject-based classification that arranges the terms in the controlled vocabulary into a hierarchy (Garshol 2004)
Example: ITIS--http://www.itis.gov/
(search Abutilon menziesii)
Ontology
A way to convey or represent a class (or classes) of things, and relationships among the classes.
Example: Gene Ontology--http://www.geneontology.org/
KOS used in Digital KOS used in Digital LibrariesLibraries
Looked at 269 online digital libraries and collections
KOS used:
Locally developed taxonomy (113)
LCSH (78)
Author list (34)
Thesauri (26)
Alphabetical listing (20)
Geographic arrangement (16)
Shiri, A. and Chase-Kruszewski, S. (2009) Knowledge organization systems in North American digital library collections. Program:electronic library and information systems. 43 (2) pp 121-139.
Discussion:Discussion:
Think about your own organization.
What type of controlled vocabularies, thesauri, and ontologies does your organization use for everyday work?
How do these vocabulary choices help you meet the goals of your institution?
See activity page
Section 2: From Thesauri to SKOS
Simple Knowledge Simple Knowledge Organization SystemsOrganization Systems
Classical view of ILS languages
<___|____|_______|______|_____|______|______|_______|_______|______>
Simple thesauri/ deeper taxonomies low level full/intricate
Key word CV thesauri ontologies ontologies
Lists (i.e WordNet) (i.e. OWL)
SKOS
Common thesaural Common thesaural identifiersidentifiers
SN Scope Note Instruction, e.g. don’t invert phrases
USE Use (another term in preference to this one)
UF Used For
BT Broader Term
NT Narrower Term
RT Related Term
Syndetic Syndetic RelationshipsRelationships
Syndetic relationships are the conceptual connections between terms.
Three types of syndetic relationships Hierarchical Equivalent Associative
HierarchicalHierarchical Level of generality – both preferred terms
BT (broader term) Birthday cakes
BT Cakes
NT (narrower term) Cakes
NT Birthday cakes
…remember inheritance
EquivalentEquivalent When two or more terms represent the
same concept
One is the preferred term (descriptor), where all the information is collected
The other is the non-preferred and helps the user to find the appropriate term
EquivalentEquivalent
• Non-preferred term USE Preferred term– Biological diversification
USE Biodiversity
• Preferred term UF (used for) Non-preferred term– Biodiversity
UF Biological diversification
AssociativeAssociative One preferred term is related to another
preferred term
Non-hierarchical
“See also” function
In any large thesaurus, a significant number of terms will mean similar things or cover related areas, without necessarily being synonyms or fitting into a defined hierarchy
AssociativeAssociative
• Related Terms (RT) can be used to show these links within the thesaurus– Bed
RT Bedding– Paint Brushes
RT Painting– Vandalism
RT Hostility– Programming
RT Software
Identifiers to SKOS Identifiers to SKOS codecode
• SN Scope Note = skos:scopeNote
• USE Use = skos:prefLabel
• UF Used For = skos:altLabel
• BT Broader Term = skos:broader
• NT Narrower Term = skos:narrower
• RT Related Term = skos:related
Each entry term has a skos:concept
Terms vs. Concepts?Terms vs. Concepts?
Example: TableExample: Table
Lexical level : Table
Conceptual level :
What is a SKOS What is a SKOS Concept?Concept?
ZygotesBT OvaNT OocystsRT HemizygosityRT ReproductionRT ZygosityUF Ookinetes
All these relationshipsmake up a SKOS concept
ConceptualizingConceptualizing
SKOSSKOSSee activity in packet
Example 1:Example 1:Web view of NBII entryWeb view of NBII entry
XMLXMLExtensible Markup Language
--Created by the World Wide Web Consortium (W3C).
--Used to mark up documents on the internet or electronic documents.
--Users get to describe the tags that are used and define how they are used.
XML encodingXML encoding
NBII in XMLNBII in XML<CONCEPT>
<DESCRIPTOR>Desert plants</DESCRIPTOR>
<BT>Desert organisms</BT>
<BT>Plants</BT>
<NT>Succulents</NT>
<SC>ORIG Original</SC>
<STA>Approved</STA>
<TYP>Descriptor</TYP>
<INP>2007-08-14</INP>
<UPD>2007-08-14</UPD>
</CONCEPT>
CreatingCreating
SKOS/XMLSKOS/XMLSee activity online
RDFRDFResource Description Framework
“is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model. It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax formats”
--from Wikipedia
RDF data modelRDF data model is similar to Entity-Relationship or Class
diagrams,
statements about resource in subject-predicate- object expressions called “triples”.
subject = resource
predicate = traits or aspects of the resource and expresses a relationship between the subject and the object.
http://www.w3.org/TR/rdf-concepts/
The sky The sky has the color has the color blueblue
RDF triple:
a subject denoting "the sky“
a predicate denoting "has the color”
an object denoting "blue”
Things to know Things to know about RDFabout RDF
Everything can be identified by URI’s
Resources and links can have types
Partial information is tolerated
There is no need for absolute truth
Evolution is supported
Minimalist design
http://www.w3.org/2001/12/semweb-fin/w3csw
Example of RDFExample of RDF<rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-
syntax-ns#”
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="http://hive.nescent.org/">
<dc:title>HIVE Web Interface</dc:title>
</rdf:Description>
</rdf>
NB
II in S
KO
S/R
DF
NB
II in S
KO
S/R
DF
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#">
<rdf:Description rdf:about="http://thesaurus.nbii.gov/nbii#Desert-plants">
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:prefLabel>Desert plants</skos:prefLabel>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Desert-organisms”/>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Plants”/>
<skos:narrower rdf:resource="http://thesaurus.nbii.gov/nbii#Succulents”/>
<skos:inScheme rdf:resource="http://thesaurus.nbii.gov/nbii#"/>
<skos:scopeNote>ORIG Original</skos:scopeNote>
</rdf:Description>
</rdf:RDF>
Deconstructing SKOS/RDF
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#">
<rdf:Description rdf:about="http://thesaurus.nbii.gov/nbii#Desert-plants">
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:prefLabel>Desert plants</skos:prefLabel>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Desert-organisms”/>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Plants”/>
<skos:narrower rdf:resource="http://thesaurus.nbii.gov/nbii#Succulents”/>
<skos:inScheme rdf:resource="http://thesaurus.nbii.gov/nbii#"/>
<skos:scopeNote>ORIG Original</skos:scopeNote>
</rdf:Description>
</rdf:RDF>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#">
<rdf:Description rdf:about="http://thesaurus.nbii.gov/nbii#Desert-plants">
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:prefLabel>Desert plants</skos:prefLabel>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Desert-organisms”/>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Plants”/>
<skos:narrower rdf:resource="http://thesaurus.nbii.gov/nbii#Succulents”/>
<skos:inScheme rdf:resource="http://thesaurus.nbii.gov/nbii#"/>
<skos:scopeNote>ORIG Original</skos:scopeNote>
</rdf:Description>
</rdf:RDF>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#">
<rdf:Description rdf:about="http://thesaurus.nbii.gov/nbii#Desert-plants">
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:prefLabel>Desert plants</skos:prefLabel>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Desert-organisms”/>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Plants”/>
<skos:narrower rdf:resource="http://thesaurus.nbii.gov/nbii#Succulents”/>
<skos:inScheme rdf:resource="http://thesaurus.nbii.gov/nbii#"/>
<skos:scopeNote>ORIG Original</skos:scopeNote>
</rdf:Description>
</rdf:RDF>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#">
<rdf:Description rdf:about="http://thesaurus.nbii.gov/nbii#Desert-plants">
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:prefLabel>Desert plants</skos:prefLabel>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Desert-organisms”/>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Plants”/>
<skos:narrower rdf:resource="http://thesaurus.nbii.gov/nbii#Succulents”/>
<skos:inScheme rdf:resource="http://thesaurus.nbii.gov/nbii#"/>
<skos:scopeNote>ORIG Original</skos:scopeNote>
</rdf:Description>
</rdf:RDF>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#">
<rdf:Description rdf:about="http://thesaurus.nbii.gov/nbii#Desert-plants">
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:prefLabel>Desert plants</skos:prefLabel>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Desert-organisms”/>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Plants”/>
<skos:narrower rdf:resource="http://thesaurus.nbii.gov/nbii#Succulents”/>
<skos:inScheme rdf:resource="http://thesaurus.nbii.gov/nbii#"/>
<skos:scopeNote>ORIG Original</skos:scopeNote>
</rdf:Description>
</rdf:RDF>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#">
<rdf:Description rdf:about="http://thesaurus.nbii.gov/nbii#Desert-plants">
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:prefLabel>Desert plants</skos:prefLabel>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Desert-organisms”/>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Plants”/>
<skos:narrower rdf:resource="http://thesaurus.nbii.gov/nbii#Succulents”/>
<skos:inScheme rdf:resource="http://thesaurus.nbii.gov/nbii#"/>
<skos:scopeNote>ORIG Original</skos:scopeNote>
</rdf:Description>
</rdf:RDF>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#">
<rdf:Description rdf:about="http://thesaurus.nbii.gov/nbii#Desert-plants">
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:prefLabel>Desert plants</skos:prefLabel>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Desert-organisms”/>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Plants”/>
<skos:narrower rdf:resource="http://thesaurus.nbii.gov/nbii#Succulents”/>
<skos:inScheme rdf:resource="http://thesaurus.nbii.gov/nbii#"/>
<skos:scopeNote>ORIG Original</skos:scopeNote>
</rdf:Description>
</rdf:RDF>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:skos="http://www.w3.org/2004/02/skos/core#">
<rdf:Description rdf:about="http://thesaurus.nbii.gov/nbii#Desert-plants">
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:prefLabel>Desert plants</skos:prefLabel>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Desert-organisms”/>
<skos:broader rdf:resource="http://thesaurus.nbii.gov/nbii#Plants”/>
<skos:narrower rdf:resource="http://thesaurus.nbii.gov/nbii#Succulents”/>
<skos:inScheme rdf:resource="http://thesaurus.nbii.gov/nbii#"/>
<skos:scopeNote>ORIG Original</skos:scopeNote>
</rdf:Description>
</rdf:RDF>
DeconstructingDeconstructing
SKOS/RDFSKOS/RDFFor more examples of deconstruction
see packet
Constructing SKOS
See activities online
Section 3: From SKOS to HIVE
Examples of Examples of Projects/Communities Projects/Communities
Using SKOSUsing SKOS W3C’s List of SKOS/Datasets
http://www.w3.org/2001/sw/wiki/SKOS/Datasets
Library of Congress
http://id.loc.gov/authorities/search/
Europeana
http://www.europeana.eu/portal/
HIVE
http://ils.unc.edu/mrc/hive/
OverviewOverview• HIVE—Helping Interdisciplinary Vocabulary Engineering
Motivation—Dryad repository
• HIVE—Goals, status, and design•A scenario
• Usability
• Conclusion and questions
53
HIVE modelHIVE model
<AMG> approach for integrating discipline CVs Model addressing C V cost, interoperability, and usability constraints (interdisciplinary environment)
54
American Society of NaturalistsAmerican Naturalist
Ecological Society of AmericaEcology, Ecological Letters, Ecological Monographs, etc.
European Society for Evolutionary BiologyJournal of Evolutionary Biology
Society for Integrative and Comparative BiologyIntegrative and Comparative Biology
Society for Molecular Biology and EvolutionMolecular Biology and Evolution
Society for the Study of Evolution EvolutionSociety for Systematic Biology
Systematic BiologyCommercial journals
Molecular EcologyMolecular Phylogenetics and Evolution
Partner JournalsPartner Journals
Dryad’s workflow
~ low burden submission
<M><M>
<M>
Vocabulary needs for Vocabulary needs for DryadDryad
• Vocabulary analysis – 600 keywords, Dryad partner journals
• Vocabularies: NBII Thesaurus, LCSH, the Getty’s TGN, ERIC Thesaurus, Gene Ontology, IT IS (10 vocabularies)
• Facets: taxon, geographic name, time period, topic, research method, genotype, phenotype…
• Results431 topical terms, exact matches– NBII Thesaurus, 25%; MeSH, 18%531 terms (research method and taxon)– LCSH, 22% found exact matches, 25% partial
• Conclusion: Need multiple vocabularies
HIVE...HIVE...as a solutionas a solution• Address CV (controlled vocabulary) cost, interoperability,
and usability constraints• COST: Expensive to create, maintain, and use • INTEROPERABILITY: Developed in silos (structurally
and intellectually) • USABILITY: Interface design and functionality
limitations have been well documented
HIVE Goals− Automatic metadata
generation approach that dynamically integrates discipline-specific controlled vocabularies encoded with the Simple Knowledge Organisation System (SKOS)
• Provide efficient, affordable, interoperable, and user friendly access to multiple vocabularies during metadata creation activities
• A model that can be replicated—> model and service
Three phases of HIVE:
1. Building HIVE- Vocabulary preparation- Server development
- Primate Life Histories Working Group
- Wood Anatomy and Wood Density Working Group
2. Sharing HIVE- Continuing education
(empowering information empowering information professionalsprofessionals)
3. Evaluating HIVE- Examining HIVE in Dryad
HIVE PartnersHIVE PartnersVocabulary
Partners Library of Congress: LCSH
the Getty Research Institute (GRI): TGN (Thesaurus of Geographic Names )
United States Geological Survey (USGS): NBII Thesaurus, Integrated Taxonomic Information System (ITIS)
Agrovoc Thesaurus
Advisory Board Jim Balhoff, NESCent Libby Dechman, LCSH Mike Frame, USGS Alistair Miles, Oxford, UK William Moen, University of North Texas Eva Méndez Rodríguez, University
Carlos III of Madrid Joseph Shubitowski, Getty Research
Institute Ed Summers, LCSH Barbara Tillett, Library of Congress Kathy Wisser, Simmons Lisa Zolly, USGS
WORKSHOPS HOSTS: Columbia Univ.; Univ. of California, San Diego; George Washington University; Univ. of North Texas; Universidad Carlos III de Madrid, Madrid, Spain
HIVE ConstructionHIVE Construction• HIVE stores millions of concepts from different vocabularies,
and makes them available on the Web by a simple HTTP– Vocabularies are imported into HIVE using SKOS/RDF format
• HIVE is divided in two different modules:
1.HIVE Core– SKOS/RDF storage and management (SESAME/Elmo)– SMART HIVESMART HIVE: Automatic Metadata Extraction and Topic
Detection (KEA++)– Concept Retrieval (Lucene)
2.HIVE Web– Web user Interface (GWT—Google Web Toolkit)– Machine oriented interface (SOAP and REST)
A scenarioA scenario
HIVE for scientists, depositors
HIVE for information professionals: curators, professional librarians, archivists, museum catalogers
Meet AmyMeet Amy
Amy Zanne is a botanist.
Like every good scientist, she publishes.
~~~~Amy~~~~Amy
• Amy Zanne is a botanist.
• Like every good scientist, she publishes.
• She deposits data in Dryad.
Dryad’s workflow
~ low burden submission
<M><M>
<M>
UsabilityUsabilityLS and IS students (32 students) - Understanding HIVE: 3.8 on 5 pt. scale- Ease of navigation: 4.5- Concept cloud a good idea: 3.3 - Represent document accurately:
2.0 (simple HIVE), 3.3 (smart HIVE)
Advisory board (10 members)- Systems/technical folks want integration w/systems, Getty—
EAD- Librarians/KO folks, want to see term relationships- Like tag cloud, want relevance percentages- Color, placement of box, labels..
White 2009-2010; HIVE Team 2009-2010
UsabilityUsability
Huang, 2010
System usability and flow System usability and flow metricsmetrics
Huang, 2010
ChallengesChallenges Building vs. doing/analysis
• Source for HIVE generation, beyond abstracts Combining many vocabularies during the indexing/term
• matching phase is difficult, time consuming, inefficient.• NLP and machine learning offer promise
Interoperability = dumbing down • ontologies
Proof-of-concept/ illustrate the differences between HIVE and other vocabulary registries (NCBO and OBO Foundry)
General large team logistics, and having people from multiple disciplines (also the ++)
Summary and next Summary and next stepssteps
Open source, customizable, SKOS, + hybrid metadata generation
Research and evaluation Team project relating to Dryad Hollie White--dissertation Lesley Skalla--master’s paper Craig Willis– MeSH/SKOS conversion Curator interface design Workshop evaluation
User’s and developer’s groups on “Google Groups” Long Term Ecological Research (LTER) Network (http://www.lternet.edu/)
Section 4: Evaluating HIVE
Comparing manual and automatic classificationof science abstracts
Join Us @Join Us @HIVE Community
http://groups.google.com/group/hive-community
Google Code page (to get your own HIVE)
http://code.google.com/p/hive-mrc/
Questions /CommentsQuestions /CommentsHollie White
Ryan Scherle
Jane Greenberg
Craig Willis