MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why...

108
MASWS Natural Language and the Semantic Web Kate Byrne School of Informatics 14 February 2011 1 / 27

Transcript of MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why...

Page 1: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

MASWSNatural Language and the Semantic Web

Kate Byrne

School of Informatics

14 February 2011

1 / 27

Page 2: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Webwhy and how?text to RDF

Organising the Triplesschema designopen issues and problems

References

2 / 27

Page 3: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web why and how?

The Semantic Web needs data

• Semantic Web first proposed in 1999

• Why hasn’t it taken off the way WWW did?

3 / 27

Page 4: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web why and how?

The Semantic Web needs data

• Semantic Web first proposed in 1999

• Why hasn’t it taken off the way WWW did?

3 / 27

Page 5: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web why and how?

The Semantic Web needs data

• Semantic Web first proposed in 1999

• Why hasn’t it taken off the way WWW did?

not

enough

data

queries

aren’t

usefulmy data

adding

I’ll delay

3 / 27

Page 6: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web why and how?

How to get data

create convert

How do we populate the Semantic Web?

structured data

and semi−structured

unstructured

4 / 27

Page 7: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web why and how?

How to get data

create convert

structured data

How do we populate the Semantic Web?

unstructured

and semi−structured

4 / 27

Page 8: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web why and how?

Semi-structured content is everywhere

5 / 27

Page 9: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web why and how?

Semi-structured content is everywhere

5 / 27

Page 10: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web why and how?

Semi-structured content is everywhere

5 / 27

Page 11: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web why and how?

Why convert unstructured text?

• Extracting RDF triples from free text is not easy• So why is it worth doing?

• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating

• Can the web learn to talk in natural language...?

6 / 27

Page 12: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web why and how?

Why convert unstructured text?

• Extracting RDF triples from free text is not easy• So why is it worth doing?

• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating

• Can the web learn to talk in natural language...?

6 / 27

Page 13: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web why and how?

Why convert unstructured text?

• Extracting RDF triples from free text is not easy• So why is it worth doing?

• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating

• Can the web learn to talk in natural language...?

6 / 27

Page 14: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web why and how?

Why convert unstructured text?

• Extracting RDF triples from free text is not easy• So why is it worth doing?

• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating

• Can the web learn to talk in natural language...?

6 / 27

Page 15: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web why and how?

Why convert unstructured text?

• Extracting RDF triples from free text is not easy• So why is it worth doing?

• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating

• Can the web learn to talk in natural language...?

6 / 27

Page 16: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web why and how?

Why convert unstructured text?

• Extracting RDF triples from free text is not easy• So why is it worth doing?

• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating

• Can the web learn to talk in natural language...?

6 / 27

Page 17: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web why and how?

Why convert unstructured text?

• Extracting RDF triples from free text is not easy• So why is it worth doing?

• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating

• Can the web learn to talk in natural language...?

6 / 27

Page 18: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web why and how?

Why convert unstructured text?

• Extracting RDF triples from free text is not easy• So why is it worth doing?

• there was knowledge before WWW (!)• high quality material, often historical• big digitisation push in 1990s (and again in 2011?) –• – but still challenging to search text effectively• natural language is our preferred way of communicating

• Can the web learn to talk in natural language...?

6 / 27

Page 19: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web why and how?

7 / 27

Page 20: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Natural Language Processing

• Extraction of semantic content from natural language text

• Systems now available – Open Calais, Powerset, AlchemyAPI

• To explain the process: part of my PhD work

“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs

8 / 27

Page 21: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Natural Language Processing

• Extraction of semantic content from natural language text

• Systems now available – Open Calais, Powerset, AlchemyAPI

• To explain the process: part of my PhD work

“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs

8 / 27

Page 22: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Natural Language Processing

• Extraction of semantic content from natural language text

• Systems now available – Open Calais, Powerset, AlchemyAPI

• To explain the process: part of my PhD work

“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs

8 / 27

Page 23: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Natural Language Processing

• Extraction of semantic content from natural language text

• Systems now available – Open Calais, Powerset, AlchemyAPI

• To explain the process: part of my PhD work

“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs

8 / 27

Page 24: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Natural Language Processing

• Extraction of semantic content from natural language text

• Systems now available – Open Calais, Powerset, AlchemyAPI

• To explain the process: part of my PhD work

“Kate likes chocolate.”Think of RDF triples as declarative sentences:RDF nodes = nouns; RDF arcs = verbs

:kate:likes

:chocolate

prefix : <http://www.ltg.ed.ac.uk/>

8 / 27

Page 25: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Converting text to RDF

fancy NLP processing

and RDFisation

9 / 27

Page 26: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Example – RCAHMS text

10 / 27

Evidence of a quartz knapping site was found within the confines of the stone

strongly suggests a domestic site.Besides the quartz implements and corresponding waste, several other artifacts of localorigin occurred including a split pebble axe of greenstone with Shetland EarlyBronze Age affinities. B Beveridge, 1972.Field survey and excavation, as a response to continual wind and marineerosion, was carried out at the Sands of Breckon between1982 and 1983.HP50NW 11.00 was recorded as a stone settings surrounded byoccupational debris (Site 22). Excavation revealed midden deposits of anearly Iron Age date and a surface scatter of artefacts of mixed dates. Thestone settings were tentatively interpreted as the basal stones of longcists.Historic Scotland Archive Project (SW) 2002.

circle, and in conjunction with several structures within the inner ring,

site 20

Page 27: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Example – RCAHMS text

10 / 27

site 20

Page 28: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Example – RCAHMS text

10 / 27

site 20

Page 29: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

The txt2rdf process in a nutshell

• First find the important nouns, or “named entities”:

Example

Mr. Peter Moar found a small hoard of 5 polished stone knives on thesame patch in May 1946.

• Then look for relationships between them:

11 / 27

Page 30: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

The txt2rdf process in a nutshell

• First find the important nouns, or “named entities”:

Example

Mr. Peter Moar found a small hoard of 5 polished stone knives on thesame patch in May 1946.

• Then look for relationships between them:

11 / 27

Page 31: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

The txt2rdf process in a nutshell

• First find the important nouns, or “named entities”:

Example

Mr. Peter Moar found a small hoard of 5 polished stone knives on thesame patch in May 1946.

• Then look for relationships between them:

11 / 27

Page 32: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

The txt2rdf process in a nutshell

• First find the important nouns, or “named entities”:

Example

Mr. Peter Moar found a small hoard of 5 polished stone knives on thesame patch in May 1946.

• Then look for relationships between them:

Mr. Peter Moar found a small hoard of 5 polished stone knives on the

same patch in May 1946.

11 / 27

Page 33: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

My txt2rdf pipeline

sentenceand para

split

POS tagtokenise

multi−wordtokens and

features trained NERmodel

list of NEsand

classes

removeunwantedrelations

generatetriples

attachsiteids

trained REmodel

set of NEpairs andfeatures

list ofrelations

and classes

sfsjksjwjvssjkljljs sd’lajoen s

jjs kjdlk lksjlkj sks oihhg sk

jjlkjlj jljbjl skj ekw

RDFtranslation

Graphof triples

Pre−processing Named Entity Recognition

Relation Extraction

Text documents

12 / 27

Page 34: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Dealing with EVENTs

Mr. Peter Moar found a small hoard of 5 polished stone knives on the

same patch in May 1946.

• FIND event is n-ary relation:

eventRel(eventAgent, eventAgentRole, eventDate, eventPatient, eventPlace)

find123(“Mr. Peter Moar”,,“May 1946”,“polished stone knives”,“Hill of Shurton”)

Split event relation into RDF triples

:find123 :hasAgent :peterMoar .:find123 :hasDate :may1946 .:find123 :hasPatient :polishedStoneKnives .:find123 :hasLocation :hillOfShurton .

13 / 27

Page 35: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Dealing with EVENTs

Mr. Peter Moar found a small hoard of 5 polished stone knives on the

same patch in May 1946.

• FIND event is n-ary relation:

eventRel(eventAgent, eventAgentRole, eventDate, eventPatient, eventPlace)

find123(“Mr. Peter Moar”,,“May 1946”,“polished stone knives”,“Hill of Shurton”)

Split event relation into RDF triples

:find123 :hasAgent :peterMoar .:find123 :hasDate :may1946 .:find123 :hasPatient :polishedStoneKnives .:find123 :hasLocation :hillOfShurton .

13 / 27

Page 36: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Archaeological events

• Example was from RCAHMS data:http://canmore.rcahms.gov.uk/en/site/1102/details/hill+of+shurton/

• Extracting events allows structured searches

• Question: What do we do with NEs that are not in relations?

For more informationByrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text.

14 / 27

Page 37: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Archaeological events

• Example was from RCAHMS data:http://canmore.rcahms.gov.uk/en/site/1102/details/hill+of+shurton/

• Extracting events allows structured searches

• Question: What do we do with NEs that are not in relations?

For more informationByrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text.

14 / 27

Page 38: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Archaeological events

• Example was from RCAHMS data:http://canmore.rcahms.gov.uk/en/site/1102/details/hill+of+shurton/

• Extracting events allows structured searches

• Question: What do we do with NEs that are not in relations?

For more informationByrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text.

14 / 27

Page 39: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Archaeological events

• Example was from RCAHMS data:http://canmore.rcahms.gov.uk/en/site/1102/details/hill+of+shurton/

• Extracting events allows structured searches

• Question: What do we do with NEs that are not in relations?

For more informationByrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text.

14 / 27

Page 40: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Available NLP systems

• OpenCalais: http://viewer.opencalais.com/

• Powerset: now part of Microsoft’s Bing

• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg

• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)

15 / 27

Page 41: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Available NLP systems

• OpenCalais: http://viewer.opencalais.com/

• Powerset: now part of Microsoft’s Bing

• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg

• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)

15 / 27

Page 42: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Available NLP systems

• OpenCalais: http://viewer.opencalais.com/

• Powerset: now part of Microsoft’s Bing

• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg

• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)

15 / 27

Page 43: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Available NLP systems

• OpenCalais: http://viewer.opencalais.com/

• Powerset: now part of Microsoft’s Bing

• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg

• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)

15 / 27

Page 44: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Available NLP systems

• OpenCalais: http://viewer.opencalais.com/

• Powerset: now part of Microsoft’s Bing

• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg

• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)

15 / 27

Page 45: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Available NLP systems

• OpenCalais: http://viewer.opencalais.com/

• Powerset: now part of Microsoft’s Bing

• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg

• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)

15 / 27

Page 46: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Available NLP systems

• OpenCalais: http://viewer.opencalais.com/

• Powerset: now part of Microsoft’s Bing

• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg

• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)

15 / 27

Page 47: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Populating the Semantic Web txt2rdf

Available NLP systems

• OpenCalais: http://viewer.opencalais.com/

• Powerset: now part of Microsoft’s Bing

• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• Trained on lots of (English) text, mostly news articles• Many general-purpose NLP tools available online, eg

• nltk (http://www.nltk.org/)• Edinburgh LTG software (http://www.ltg.ed.ac.uk/software)• NaCTeM (http://nactem.ac.uk/)

15 / 27

Page 48: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

Populating the Semantic Webwhy and how?text to RDF

Organising the Triplesschema designopen issues and problems

References16 / 27

Page 49: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

Do we have a data schema?

create convert

structured data

How do we populate the Semantic Web?

unstructured

and semi−structured

16 / 27

Page 50: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

A schema for free text

• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:

Entities are typed by their NE class

:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .

17 / 27

Page 51: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

A schema for free text

• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:

Entities are typed by their NE class

:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .

17 / 27

Page 52: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

A schema for free text

• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:

Entities are typed by their NE class

:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .

17 / 27

Page 53: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

A schema for free text

• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:

Entities are typed by their NE class

:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .

17 / 27

Page 54: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

A schema for free text

• So far we’ve looked at instance data (ABox statements)• Where is framework to slot instances into?• Named Entity categories: PERSON, PLACE, DATE, etc• NE categories become RDFS classes:

Mr. Peter Moar found a small hoard of 5 polished stone knives on the

same patch in May 1946.

Entities are typed by their NE class

:peterMoar rdf:type :person .:found rdf:type :event .:polishedStoneKnives rdf:type :artefact .:may1946 rdf:type :date .

17 / 27

Page 55: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

RDFS class hierarchy

• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,

PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):

18 / 27

Page 56: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

RDFS class hierarchy

• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,

PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):

18 / 27

Page 57: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

RDFS class hierarchy

• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,

PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):

18 / 27

Page 58: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

RDFS class hierarchy

• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,

PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):

18 / 27

Page 59: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

RDFS class hierarchy

• NE categories usually flat list, not hierarchical• My set: ORG, PERSNAME, ROLE, SITETYPE, ARTEFACT,

PLACE, SITENAME, ADDRESS, PERIOD, DATE, EVENT• Contrast with structured data• Hand-designed hierarchy (with rdfs:subClassOf ):

http://www.ltg.ed.ac.uk/tether/

timeagentsiteid

rolepersonorg date period

loc classn

sitetype objtype

event

addresssitenameplace

grid

survey excavation find visit description creation alteration

18 / 27

Page 60: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

Property labels

• Can we extract RDF predicate set automatically from text?

• Eg clustering commonly occurring verbs

• But – schema design only has to be done once

• Property set and class hierarchy are related

• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf

• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader

• Flat list (no rdfs:subPropertyOf )

19 / 27

Page 61: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

Property labels

• Can we extract RDF predicate set automatically from text?

• Eg clustering commonly occurring verbs

• But – schema design only has to be done once

• Property set and class hierarchy are related

• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf

• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader

• Flat list (no rdfs:subPropertyOf )

19 / 27

Page 62: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

Property labels

• Can we extract RDF predicate set automatically from text?

• Eg clustering commonly occurring verbs

• But – schema design only has to be done once

• Property set and class hierarchy are related

• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf

• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader

• Flat list (no rdfs:subPropertyOf )

19 / 27

Page 63: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

Property labels

• Can we extract RDF predicate set automatically from text?

• Eg clustering commonly occurring verbs

• But – schema design only has to be done once

• Property set and class hierarchy are related

• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf

• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader

• Flat list (no rdfs:subPropertyOf )

19 / 27

Page 64: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

Property labels

• Can we extract RDF predicate set automatically from text?

• Eg clustering commonly occurring verbs

• But – schema design only has to be done once

• Property set and class hierarchy are related

• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf

• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader

• Flat list (no rdfs:subPropertyOf )

19 / 27

Page 65: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

Property labels

• Can we extract RDF predicate set automatically from text?

• Eg clustering commonly occurring verbs

• But – schema design only has to be done once

• Property set and class hierarchy are related

• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf

• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader

• Flat list (no rdfs:subPropertyOf )

19 / 27

Page 66: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

Property labels

• Can we extract RDF predicate set automatically from text?

• Eg clustering commonly occurring verbs

• But – schema design only has to be done once

• Property set and class hierarchy are related

• My set: hasEvent, hasAgent, hasAgentRole, hasPeriod,hasPatient, hasLocation, hasClassn, hasObject, partOf

• Plus “standard” ones like owl:sameAs, rdfs:seeAlso,skos:broader

• Flat list (no rdfs:subPropertyOf )

19 / 27

Page 67: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

Existing schemas and vocabularies

• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket

• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri

• http://www.w3.org/2004/02/skos/

• Reuse where possible; invent local scheme where not

• Incentive to reuse is strong

20 / 27

Page 68: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

Existing schemas and vocabularies

• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket

• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri

• http://www.w3.org/2004/02/skos/

• Reuse where possible; invent local scheme where not

• Incentive to reuse is strong

20 / 27

Page 69: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

Existing schemas and vocabularies

• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket

• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri

• http://www.w3.org/2004/02/skos/

• Reuse where possible; invent local scheme where not

• Incentive to reuse is strong

20 / 27

Page 70: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

Existing schemas and vocabularies

• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket

• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri

• http://www.w3.org/2004/02/skos/

• Reuse where possible; invent local scheme where not

• Incentive to reuse is strong

20 / 27

Page 71: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

Existing schemas and vocabularies

• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket

• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri

• http://www.w3.org/2004/02/skos/

• Reuse where possible; invent local scheme where not

• Incentive to reuse is strong

20 / 27

Page 72: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

Existing schemas and vocabularies

• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket

• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri

• http://www.w3.org/2004/02/skos/

• Reuse where possible; invent local scheme where not

• Incentive to reuse is strong

20 / 27

Page 73: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples schema design

Existing schemas and vocabularies

• Lots of vocabularies to choose from• http://esw.w3.org/topic/VocabularyMarket

• RDFS and OWL are fundamental• SKOS – translate existing domain thesauri

• http://www.w3.org/2004/02/skos/

• Reuse where possible; invent local scheme where not

• Incentive to reuse is strong

20 / 27

Page 74: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Issues around schema design

• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat

• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?

21 / 27

Page 75: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Issues around schema design

• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat

• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?

21 / 27

Page 76: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Issues around schema design

• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat

• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?

21 / 27

Page 77: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Issues around schema design

• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat

• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?

21 / 27

Page 78: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Issues around schema design

• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat

• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?

21 / 27

Page 79: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Issues around schema design

• Schema granularity• more categories to choose from⇒ NLP less accurate• so class hierarchy quite coarse and flat

• Schema discovery• tension between simplicity and query power• how will software agents “understand” your schema?

21 / 27

Page 80: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Issues around NLP accuracy

• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!

• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”

• Even given canonical URI :peterMoar for our Peter Moar...

• ... can we distinguish him from others with same name?

22 / 27

Page 81: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Issues around NLP accuracy

• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!

• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”

• Even given canonical URI :peterMoar for our Peter Moar...

• ... can we distinguish him from others with same name?

22 / 27

Page 82: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Issues around NLP accuracy

• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!

• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”

• Even given canonical URI :peterMoar for our Peter Moar...

• ... can we distinguish him from others with same name?

22 / 27

Page 83: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Issues around NLP accuracy

• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!

• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”

• Even given canonical URI :peterMoar for our Peter Moar...

• ... can we distinguish him from others with same name?

22 / 27

Page 84: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Issues around NLP accuracy

• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!

• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”

• Even given canonical URI :peterMoar for our Peter Moar...

• ... can we distinguish him from others with same name?

22 / 27

Page 85: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Issues around NLP accuracy

• Disambiguation of extracted NEs (Mr Peter Moar, P Moar, etc)• NLP is not 100% accurate!

• how many false statements can Semantic Web stand?• “the computer said it, so it must be true”

• Even given canonical URI :peterMoar for our Peter Moar...

• ... can we distinguish him from others with same name?

22 / 27

Page 86: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Grounding local data

• Grounding: tie unique canonical URIs to authoritative reference

• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?

• Grounding during NLP processing impractical

• Use owl:sameAs? – redundant triples, more complex queries

• Who maintains the authoritative lists of URIs?

• Specialist thesauri, eg archaeological classification terms like“Stone setting”

23 / 27

Page 87: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Grounding local data

• Grounding: tie unique canonical URIs to authoritative reference

• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?

• Grounding during NLP processing impractical

• Use owl:sameAs? – redundant triples, more complex queries

• Who maintains the authoritative lists of URIs?

• Specialist thesauri, eg archaeological classification terms like“Stone setting”

23 / 27

Page 88: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Grounding local data

• Grounding: tie unique canonical URIs to authoritative reference

• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?

• Grounding during NLP processing impractical

• Use owl:sameAs? – redundant triples, more complex queries

• Who maintains the authoritative lists of URIs?

• Specialist thesauri, eg archaeological classification terms like“Stone setting”

23 / 27

Page 89: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Grounding local data

• Grounding: tie unique canonical URIs to authoritative reference

• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?

• Grounding during NLP processing impractical

• Use owl:sameAs? – redundant triples, more complex queries

• Who maintains the authoritative lists of URIs?

• Specialist thesauri, eg archaeological classification terms like“Stone setting”

23 / 27

Page 90: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Grounding local data

• Grounding: tie unique canonical URIs to authoritative reference

• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?

• Grounding during NLP processing impractical

• Use owl:sameAs? – redundant triples, more complex queries

• Who maintains the authoritative lists of URIs?

• Specialist thesauri, eg archaeological classification terms like“Stone setting”

23 / 27

Page 91: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Grounding local data

• Grounding: tie unique canonical URIs to authoritative reference

• Is http://www.ltg.ed.ac.uk/tether/Loc/Place#edinburgh same ashttp://www.geonames.org/2650225/edinburgh.html?

• Grounding during NLP processing impractical

• Use owl:sameAs? – redundant triples, more complex queries

• Who maintains the authoritative lists of URIs?

• Specialist thesauri, eg archaeological classification terms like“Stone setting”

23 / 27

Page 92: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Grounding specialist terminology

siteid:site20

address:breckon

sitename:sands+of+breckon

date:1982

event:excavation

event:excavation20w158

address:hp50nw+11.01+hp+5304+0519

sitetype:stone+settings20w179

sitetype:stone+setting

"stone setting"

"An arrangement of twoor more standing stones"

sitetype:religious+ritual+and+funerary

sitetype:standing+stone

sitetype:stone+circle

sitetype:stone+row

sitetype:

:hasLocation

:hasLocation

:hasPeriod

rdf:type

:hasEvent

:hasLocation

:hasClassn

rdf:type

rdfs:label

skos:scopeNote

skos:broader

skos:related

rdfs:subClassOf

Aim: ground place names, people, organisations, etc.24 / 27

Page 93: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Grounding specialist terminology

siteid:site20

address:breckon

sitename:sands+of+breckon

date:1982

event:excavation

event:excavation20w158

address:hp50nw+11.01+hp+5304+0519

"An arrangement of twoor more standing stones"

sitetype:religious+ritual+and+funerary

sitetype:standing+stone

sitetype:stone+circle

sitetype:stone+row

sitetype:stone+settings20w179

sitetype:

"stone setting"

sitetype:stone+setting

:hasClassn

rdfs:label

skos:scopeNote

skos:broader

skos:related

rdf:type

rdfs:subClassOf

:hasLocation

:hasLocation

:hasPeriod

rdf:type

:hasEvent

:hasLocation

Aim: ground place names, people, organisations, etc.24 / 27

Page 94: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Grounding specialist terminology

siteid:site20

address:breckon

sitename:sands+of+breckon

date:1982

event:excavation

event:excavation20w158

address:hp50nw+11.01+hp+5304+0519

"An arrangement of twoor more standing stones"

sitetype:religious+ritual+and+funerary

sitetype:standing+stone

sitetype:stone+circle

sitetype:stone+row

sitetype:stone+settings20w179

sitetype:

"stone setting"

sitetype:stone+setting

:hasClassn

rdfs:label

skos:scopeNote

skos:broader

skos:related

rdf:type

rdfs:subClassOf

:hasLocation

:hasLocation

:hasPeriod

rdf:type

:hasEvent

:hasLocation

Aim: ground place names, people, organisations, etc.24 / 27

Page 95: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Linked Data

25 / 27

Page 96: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Grounding turns data into Linked Data

26 / 27

Page 97: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Grounding turns data into Linked Data

26 / 27

Page 98: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Grounding turns data into Linked Data

26 / 27

Page 99: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

Organising the Triples open issues and problems

Grounding turns data into Linked Data

grounding local URIs

against "authority" nodes

is the

next big challenge!

26 / 27

Page 100: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27

Page 101: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27

Page 102: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27

Page 103: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27

Page 104: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27

Page 105: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27

Page 106: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27

Page 107: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27

Page 108: MASWS - Natural Language and the Semantic Web · Populating the Semantic Web why and how? Why convert unstructured text? Extracting RDF triples from free text is not easy So why is

References

References to follow up

• NLP software for Semantic Web applications• OpenCalais: http://opencalais.com/• AlchemyAPI: http://www.alchemyapi.com/api/demo.html

• VocabularyMarket: http://esw.w3.org/topic/VocabularyMarket• Case study

• Byrne and Klein (2009) Automatic Extraction of ArchaeologicalEvents from Text. CAA2009, Williamsburg, VA.

• General references for NLP• Natural Language Processing with Python, Bird, Klein and Loper

(http://www.nltk.org/book)• Speech and Language Processing, Jurafsky and Martin

(http://www.cs.colorado.edu/∼martin/slp.html)

27 / 27