MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why...
Transcript of MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why...
MASWSNatural Language and the Semantic Web
Kate Byrne
School of Informatics
14 February 2011
1 / 27
Populating the Semantic Webwhy and how?text to RDF
Organising the Triplesschema designopen issues and problems
References
2 / 27
Populating the Semantic Web why and how?
The Semantic Web needs data
• Semantic Web first proposed in 1999
• Why hasn’t it taken off the way WWW did?
3 / 27
Populating the Semantic Web why and how?
The Semantic Web needs data
• Semantic Web first proposed in 1999
• Why hasn’t it taken off the way WWW did?
3 / 27
Populating the Semantic Web why and how?
The Semantic Web needs data
• Semantic Web first proposed in 1999
• Why hasn’t it taken off the way WWW did?
not
enough
data
queries
aren’t
usefulmy data
adding
I’ll delay
3 / 27
Populating the Semantic Web why and how?
How to get data
create convert
How do we populate the Semantic Web?
structured data
and semi−structured
unstructured
4 / 27
Populating the Semantic Web why and how?
How to get data
create convert
structured data
How do we populate the Semantic Web?
unstructured
and semi−structured
4 / 27
Populating the Semantic Web why and how?
Semi-structured content is everywhere
5 / 27
Populating the Semantic Web why and how?
Semi-structured content is everywhere
5 / 27
Populating the Semantic Web why and how?
Semi-structured content is everywhere
5 / 27
Populating the Semantic Web why and how?
Why convert unstructured text?
• Extracting RDF triples from free text is not easy• So why is it worth doing?
• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating
• Can the web learn to talk in natural language...?
6 / 27
Populating the Semantic Web why and how?
Why convert unstructured text?
• Extracting RDF triples from free text is not easy• So why is it worth doing?
• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating
• Can the web learn to talk in natural language...?
6 / 27
Populating the Semantic Web why and how?
Why convert unstructured text?
• Extracting RDF triples from free text is not easy• So why is it worth doing?
• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating
• Can the web learn to talk in natural language...?
6 / 27
Populating the Semantic Web why and how?
Why convert unstructured text?
• Extracting RDF triples from free text is not easy• So why is it worth doing?
• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating
• Can the web learn to talk in natural language...?
6 / 27
Populating the Semantic Web why and how?
Why convert unstructured text?
• Extracting RDF triples from free text is not easy• So why is it worth doing?
• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating
• Can the web learn to talk in natural language...?
6 / 27
Populating the Semantic Web why and how?
Why convert unstructured text?
• Extracting RDF triples from free text is not easy• So why is it worth doing?
• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating
• Can the web learn to talk in natural language...?
6 / 27
Populating the Semantic Web why and how?
Why convert unstructured text?
• Extracting RDF triples from free text is not easy• So why is it worth doing?
• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating
• Can the web learn to talk in natural language...?
6 / 27
Populating the Semantic Web why and how?
Why convert unstructured text?
• Extracting RDF triples from free text is not easy• So why is it worth doing?
• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating
• Can the web learn to talk in natural language...?
6 / 27
Populating the Semantic Web why and how?
7 / 27
Populating the Semantic Web txt2rdf
Natural Language Processing
• Extraction of semantic content from natural language text
• Systems now available – Open Calais, Powerset, AlchemyAPI
• To explain the process: part of my PhD work
“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs
8 / 27
Populating the Semantic Web txt2rdf
Natural Language Processing
• Extraction of semantic content from natural language text
• Systems now available – Open Calais, Powerset, AlchemyAPI
• To explain the process: part of my PhD work
“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs
8 / 27
Populating the Semantic Web txt2rdf
Natural Language Processing
• Extraction of semantic content from natural language text
• Systems now available – Open Calais, Powerset, AlchemyAPI
• To explain the process: part of my PhD work
“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs
8 / 27
Populating the Semantic Web txt2rdf
Natural Language Processing
• Extraction of semantic content from natural language text
• Systems now available – Open Calais, Powerset, AlchemyAPI
• To explain the process: part of my PhD work
“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs
8 / 27
Populating the Semantic Web txt2rdf
Natural Language Processing
• Extraction of semantic content from natural language text
• Systems now available – Open Calais, Powerset, AlchemyAPI
• To explain the process: part of my PhD work
“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs
:kate:likes
:chocolate
prefix : <http://www.ltg.ed.ac.uk/>
8 / 27
Populating the Semantic Web txt2rdf
Converting text to RDF
fancy NLP processing
and RDFisation
9 / 27
Populating the Semantic Web txt2rdf
Example – RCAHMS text
10 / 27
Evidence of a quartz knapping site was found within the confines of the stone
strongly suggests a domestic site.Besides the quartz implements and corresponding waste, several other artifacts of localorigin occurred including a split pebble axe of greenstone with Shetland EarlyBronze Age affinities. B Beveridge, 1972.Field survey and excavation, as a response to continual wind and marineerosion, was carried out at the Sands of Breckon between1982 and 1983.HP50NW 11.00 was recorded as a stone settings surrounded byoccupational debris (Site 22). Excavation revealed midden deposits of anearly Iron Age date and a surface scatter of artefacts of mixed dates. Thestone settings were tentatively interpreted as the basal stones of longcists.Historic Scotland Archive Project (SW) 2002.
circle, and in conjunction with several structures within the inner ring,
site 20
Populating the Semantic Web txt2rdf
Example – RCAHMS text
10 / 27
site 20
Populating the Semantic Web txt2rdf
Example – RCAHMS text
10 / 27
site 20
Populating the Semantic Web txt2rdf
The txt2rdf process in a nutshell
• First find the important nouns, or “named entities”:
Example
Mr. Peter Moar found a small hoard of 5 polished stone knives on thesame patch in May 1946.
• Then look for relationships between them:
11 / 27
Populating the Semantic Web txt2rdf
The txt2rdf process in a nutshell
• First find the important nouns, or “named entities”:
Example
Mr. Peter Moar found a small hoard of 5 polished stone knives on thesame patch in May 1946.
• Then look for relationships between them:
11 / 27
Populating the Semantic Web txt2rdf
The txt2rdf process in a nutshell
• First find the important nouns, or “named entities”:
Example
Mr. Peter Moar found a small hoard of 5 polished stone knives on thesame patch in May 1946.
• Then look for relationships between them:
11 / 27
Populating the Semantic Web txt2rdf
The txt2rdf process in a nutshell
• First find the important nouns, or “named entities”:
Example
Mr. Peter Moar found a small hoard of 5 polished stone knives on thesame patch in May 1946.
• Then look for relationships between them:
Mr. Peter Moar found a small hoard of 5 polished stone knives on the
same patch in May 1946.
11 / 27
Populating the Semantic Web txt2rdf
My txt2rdf pipeline
sentenceand para
split
POS tagtokenise
multi−wordtokens and
features trained NERmodel
list of NEsand
classes
removeunwantedrelations
generatetriples
attachsiteids
trained REmodel
set of NEpairs andfeatures
list ofrelations
and classes
sfsjksjwjvssjkljljs sd’lajoen s
jjs kjdlk lksjlkj sks oihhg sk
jjlkjlj jljbjl skj ekw
RDFtranslation
Graphof triples
Pre−processing Named Entity Recognition
Relation Extraction
Text documents
12 / 27
Populating the Semantic Web txt2rdf
Dealing with EVENTs
Mr. Peter Moar found a small hoard of 5 polished stone knives on the
same patch in May 1946.
• FIND event is n-ary relation:
eventRel(eventAgent, eventAgentRole, eventDate, eventPatient, eventPlace)
find123(“Mr. Peter Moar”,,“May 1946”,“polished stone knives”,“Hill of Shurton”)
Split event relation into RDF triples
:find123 :hasAgent :peterMoar .:find123 :hasDate :may1946 .:find123 :hasPatient :polishedStoneKnives .:find123 :hasLocation :hillOfShurton .
13 / 27
Populating the Semantic Web txt2rdf
Dealing with EVENTs
Mr. Peter Moar found a small hoard of 5 polished stone knives on the
same patch in May 1946.
• FIND event is n-ary relation:
eventRel(eventAgent, eventAgentRole, eventDate, eventPatient, eventPlace)
find123(“Mr. Peter Moar”,,“May 1946”,“polished stone knives”,“Hill of Shurton”)
Split event relation into RDF triples
:find123 :hasAgent :peterMoar .:find123 :hasDate :may1946 .:find123 :hasPatient :polishedStoneKnives .:find123 :hasLocation :hillOfShurton .
13 / 27
Populating the Semantic Web txt2rdf
Archaeological events
• Example was from RCAHMS data:http://canmore.rcahms.gov.uk/en/site/1102/details/hill+of+shurton/
• Extracting events allows structured searches
• Question: What do we do with NEs that are not in relations?
For more informationByrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text.
14 / 27
Populating the Semantic Web txt2rdf
Archaeological events
• Example was from RCAHMS data:http://canmore.rcahms.gov.uk/en/site/1102/details/hill+of+shurton/
• Extracting events allows structured searches
• Question: What do we do with NEs that are not in relations?
For more informationByrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text.
14 / 27
Populating the Semantic Web txt2rdf
Archaeological events
• Example was from RCAHMS data:http://canmore.rcahms.gov.uk/en/site/1102/details/hill+of+shurton/
• Extracting events allows structured searches
• Question: What do we do with NEs that are not in relations?
For more informationByrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text.
14 / 27
Populating the Semantic Web txt2rdf
Archaeological events
• Example was from RCAHMS data:http://canmore.rcahms.gov.uk/en/site/1102/details/hill+of+shurton/
• Extracting events allows structured searches
• Question: What do we do with NEs that are not in relations?
For more informationByrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text.
14 / 27
Populating the Semantic Web txt2rdf
Available NLP systems
• OpenCalais: http://viewer.opencalais.com/
• Powerset: now part of Microsoft’s Bing
• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg
• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)
15 / 27
Populating the Semantic Web txt2rdf
Available NLP systems
• OpenCalais: http://viewer.opencalais.com/
• Powerset: now part of Microsoft’s Bing
• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg
• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)
15 / 27
Populating the Semantic Web txt2rdf
Available NLP systems
• OpenCalais: http://viewer.opencalais.com/
• Powerset: now part of Microsoft’s Bing
• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg
• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)
15 / 27
Populating the Semantic Web txt2rdf
Available NLP systems
• OpenCalais: http://viewer.opencalais.com/
• Powerset: now part of Microsoft’s Bing
• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg
• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)
15 / 27
Populating the Semantic Web txt2rdf
Available NLP systems
• OpenCalais: http://viewer.opencalais.com/
• Powerset: now part of Microsoft’s Bing
• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg
• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)
15 / 27
Populating the Semantic Web txt2rdf
Available NLP systems
• OpenCalais: http://viewer.opencalais.com/
• Powerset: now part of Microsoft’s Bing
• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg
• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)
15 / 27
Populating the Semantic Web txt2rdf
Available NLP systems
• OpenCalais: http://viewer.opencalais.com/
• Powerset: now part of Microsoft’s Bing
• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg
• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)
15 / 27
Populating the Semantic Web txt2rdf
Available NLP systems
• OpenCalais: http://viewer.opencalais.com/
• Powerset: now part of Microsoft’s Bing
• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg
• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)
15 / 27
Organising the Triples schema design
Populating the Semantic Webwhy and how?text to RDF
Organising the Triplesschema designopen issues and problems
References16 / 27
Organising the Triples schema design
Do we have a data schema?
create convert
structured data
How do we populate the Semantic Web?
unstructured
and semi−structured
16 / 27
Organising the Triples schema design
A schema for free text
• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:
Entities are typed by their NE class
:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .
17 / 27
Organising the Triples schema design
A schema for free text
• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:
Entities are typed by their NE class
:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .
17 / 27
Organising the Triples schema design
A schema for free text
• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:
Entities are typed by their NE class
:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .
17 / 27
Organising the Triples schema design
A schema for free text
• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:
Entities are typed by their NE class
:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .
17 / 27
Organising the Triples schema design
A schema for free text
• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:
Mr. Peter Moar found a small hoard of 5 polished stone knives on the
same patch in May 1946.
Entities are typed by their NE class
:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .
17 / 27
Organising the Triples schema design
RDFS class hierarchy
• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,
PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):
18 / 27
Organising the Triples schema design
RDFS class hierarchy
• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,
PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):
18 / 27
Organising the Triples schema design
RDFS class hierarchy
• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,
PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):
18 / 27
Organising the Triples schema design
RDFS class hierarchy
• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,
PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):
18 / 27
Organising the Triples schema design
RDFS class hierarchy
• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,
PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):
http://www.ltg.ed.ac.uk/tether/
timeagentsiteid
rolepersonorg date period
loc classn
sitetype objtype
event
addresssitenameplace
grid
survey excavation find visit description creation alteration
18 / 27
Organising the Triples schema design
Property labels
• Can we extract RDF predicate set automatically from text?
• Eg clustering commonly occurring verbs
• But – schema design only has to be done once
• Property set and class hierarchy are related
• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf
• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader
• Flat list (no rdfs:subPropertyOf )
19 / 27
Organising the Triples schema design
Property labels
• Can we extract RDF predicate set automatically from text?
• Eg clustering commonly occurring verbs
• But – schema design only has to be done once
• Property set and class hierarchy are related
• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf
• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader
• Flat list (no rdfs:subPropertyOf )
19 / 27
Organising the Triples schema design
Property labels
• Can we extract RDF predicate set automatically from text?
• Eg clustering commonly occurring verbs
• But – schema design only has to be done once
• Property set and class hierarchy are related
• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf
• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader
• Flat list (no rdfs:subPropertyOf )
19 / 27
Organising the Triples schema design
Property labels
• Can we extract RDF predicate set automatically from text?
• Eg clustering commonly occurring verbs
• But – schema design only has to be done once
• Property set and class hierarchy are related
• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf
• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader
• Flat list (no rdfs:subPropertyOf )
19 / 27
Organising the Triples schema design
Property labels
• Can we extract RDF predicate set automatically from text?
• Eg clustering commonly occurring verbs
• But – schema design only has to be done once
• Property set and class hierarchy are related
• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf
• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader
• Flat list (no rdfs:subPropertyOf )
19 / 27
Organising the Triples schema design
Property labels
• Can we extract RDF predicate set automatically from text?
• Eg clustering commonly occurring verbs
• But – schema design only has to be done once
• Property set and class hierarchy are related
• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf
• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader
• Flat list (no rdfs:subPropertyOf )
19 / 27
Organising the Triples schema design
Property labels
• Can we extract RDF predicate set automatically from text?
• Eg clustering commonly occurring verbs
• But – schema design only has to be done once
• Property set and class hierarchy are related
• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf
• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader
• Flat list (no rdfs:subPropertyOf )
19 / 27
Organising the Triples schema design
Existing schemas and vocabularies
• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket
• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri
• http://www.w3.org/2004/02/skos/
• Reuse where possible; invent local scheme where not
• Incentive to reuse is strong
20 / 27
Organising the Triples schema design
Existing schemas and vocabularies
• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket
• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri
• http://www.w3.org/2004/02/skos/
• Reuse where possible; invent local scheme where not
• Incentive to reuse is strong
20 / 27
Organising the Triples schema design
Existing schemas and vocabularies
• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket
• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri
• http://www.w3.org/2004/02/skos/
• Reuse where possible; invent local scheme where not
• Incentive to reuse is strong
20 / 27
Organising the Triples schema design
Existing schemas and vocabularies
• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket
• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri
• http://www.w3.org/2004/02/skos/
• Reuse where possible; invent local scheme where not
• Incentive to reuse is strong
20 / 27
Organising the Triples schema design
Existing schemas and vocabularies
• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket
• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri
• http://www.w3.org/2004/02/skos/
• Reuse where possible; invent local scheme where not
• Incentive to reuse is strong
20 / 27
Organising the Triples schema design
Existing schemas and vocabularies
• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket
• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri
• http://www.w3.org/2004/02/skos/
• Reuse where possible; invent local scheme where not
• Incentive to reuse is strong
20 / 27
Organising the Triples schema design
Existing schemas and vocabularies
• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket
• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri
• http://www.w3.org/2004/02/skos/
• Reuse where possible; invent local scheme where not
• Incentive to reuse is strong
20 / 27
Organising the Triples open issues and problems
Issues around schema design
• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat
• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?
21 / 27
Organising the Triples open issues and problems
Issues around schema design
• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat
• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?
21 / 27
Organising the Triples open issues and problems
Issues around schema design
• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat
• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?
21 / 27
Organising the Triples open issues and problems
Issues around schema design
• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat
• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?
21 / 27
Organising the Triples open issues and problems
Issues around schema design
• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat
• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?
21 / 27
Organising the Triples open issues and problems
Issues around schema design
• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat
• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?
21 / 27
Organising the Triples open issues and problems
Issues around NLP accuracy
• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!
• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”
• Even given canonical URI :peterMoar for our Peter Moar...
• ... can we distinguish him from others with same name?
22 / 27
Organising the Triples open issues and problems
Issues around NLP accuracy
• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!
• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”
• Even given canonical URI :peterMoar for our Peter Moar...
• ... can we distinguish him from others with same name?
22 / 27
Organising the Triples open issues and problems
Issues around NLP accuracy
• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!
• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”
• Even given canonical URI :peterMoar for our Peter Moar...
• ... can we distinguish him from others with same name?
22 / 27
Organising the Triples open issues and problems
Issues around NLP accuracy
• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!
• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”
• Even given canonical URI :peterMoar for our Peter Moar...
• ... can we distinguish him from others with same name?
22 / 27
Organising the Triples open issues and problems
Issues around NLP accuracy
• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!
• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”
• Even given canonical URI :peterMoar for our Peter Moar...
• ... can we distinguish him from others with same name?
22 / 27
Organising the Triples open issues and problems
Issues around NLP accuracy
• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!
• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”
• Even given canonical URI :peterMoar for our Peter Moar...
• ... can we distinguish him from others with same name?
22 / 27
Organising the Triples open issues and problems
Grounding local data
• Grounding: tie unique canonical URIs to authoritative reference
• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?
• Grounding during NLP processing impractical
• Use owl:sameAs? – redundant triples, more complex queries
• Who maintains the authoritative lists of URIs?
• Specialist thesauri, eg archaeological classification terms like“Stone setting”
23 / 27
Organising the Triples open issues and problems
Grounding local data
• Grounding: tie unique canonical URIs to authoritative reference
• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?
• Grounding during NLP processing impractical
• Use owl:sameAs? – redundant triples, more complex queries
• Who maintains the authoritative lists of URIs?
• Specialist thesauri, eg archaeological classification terms like“Stone setting”
23 / 27
Organising the Triples open issues and problems
Grounding local data
• Grounding: tie unique canonical URIs to authoritative reference
• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?
• Grounding during NLP processing impractical
• Use owl:sameAs? – redundant triples, more complex queries
• Who maintains the authoritative lists of URIs?
• Specialist thesauri, eg archaeological classification terms like“Stone setting”
23 / 27
Organising the Triples open issues and problems
Grounding local data
• Grounding: tie unique canonical URIs to authoritative reference
• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?
• Grounding during NLP processing impractical
• Use owl:sameAs? – redundant triples, more complex queries
• Who maintains the authoritative lists of URIs?
• Specialist thesauri, eg archaeological classification terms like“Stone setting”
23 / 27
Organising the Triples open issues and problems
Grounding local data
• Grounding: tie unique canonical URIs to authoritative reference
• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?
• Grounding during NLP processing impractical
• Use owl:sameAs? – redundant triples, more complex queries
• Who maintains the authoritative lists of URIs?
• Specialist thesauri, eg archaeological classification terms like“Stone setting”
23 / 27
Organising the Triples open issues and problems
Grounding local data
• Grounding: tie unique canonical URIs to authoritative reference
• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?
• Grounding during NLP processing impractical
• Use owl:sameAs? – redundant triples, more complex queries
• Who maintains the authoritative lists of URIs?
• Specialist thesauri, eg archaeological classification terms like“Stone setting”
23 / 27
Organising the Triples open issues and problems
Grounding specialist terminology
siteid:site20
address:breckon
sitename:sands+of+breckon
date:1982
event:excavation
event:excavation20w158
address:hp50nw+11.01+hp+5304+0519
sitetype:stone+settings20w179
sitetype:stone+setting
"stone setting"
"An arrangement of twoor more standing stones"
sitetype:religious+ritual+and+funerary
sitetype:standing+stone
sitetype:stone+circle
sitetype:stone+row
sitetype:
:hasLocation
:hasLocation
:hasPeriod
rdf:type
:hasEvent
:hasLocation
:hasClassn
rdf:type
rdfs:label
skos:scopeNote
skos:broader
skos:related
rdfs:subClassOf
Aim: ground place names, people, organisations, etc.24 / 27
Organising the Triples open issues and problems
Grounding specialist terminology
siteid:site20
address:breckon
sitename:sands+of+breckon
date:1982
event:excavation
event:excavation20w158
address:hp50nw+11.01+hp+5304+0519
"An arrangement of twoor more standing stones"
sitetype:religious+ritual+and+funerary
sitetype:standing+stone
sitetype:stone+circle
sitetype:stone+row
sitetype:stone+settings20w179
sitetype:
"stone setting"
sitetype:stone+setting
:hasClassn
rdfs:label
skos:scopeNote
skos:broader
skos:related
rdf:type
rdfs:subClassOf
:hasLocation
:hasLocation
:hasPeriod
rdf:type
:hasEvent
:hasLocation
Aim: ground place names, people, organisations, etc.24 / 27
Organising the Triples open issues and problems
Grounding specialist terminology
siteid:site20
address:breckon
sitename:sands+of+breckon
date:1982
event:excavation
event:excavation20w158
address:hp50nw+11.01+hp+5304+0519
"An arrangement of twoor more standing stones"
sitetype:religious+ritual+and+funerary
sitetype:standing+stone
sitetype:stone+circle
sitetype:stone+row
sitetype:stone+settings20w179
sitetype:
"stone setting"
sitetype:stone+setting
:hasClassn
rdfs:label
skos:scopeNote
skos:broader
skos:related
rdf:type
rdfs:subClassOf
:hasLocation
:hasLocation
:hasPeriod
rdf:type
:hasEvent
:hasLocation
Aim: ground place names, people, organisations, etc.24 / 27
Organising the Triples open issues and problems
Linked Data
25 / 27
Organising the Triples open issues and problems
Grounding turns data into Linked Data
26 / 27
Organising the Triples open issues and problems
Grounding turns data into Linked Data
26 / 27
Organising the Triples open issues and problems
Grounding turns data into Linked Data
26 / 27
Organising the Triples open issues and problems
Grounding turns data into Linked Data
grounding local URIs
against "authority" nodes
is the
next big challenge!
26 / 27
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27
References
References to follow up
• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html
• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study
• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.
• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper
(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin
(http://www.cs.colorado.edu/∼martin/slp.html)
27 / 27