[DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides....

21
What digital corpora for Ancient History? Linguistic Annotation of Thucydides 1.98-118 Treebanking in the World of Thucydides Linguistic annotation for the Hellespont Project Francesco Mambrini Center For Hellenic Studies Deutsches Archäologisches Institut November 20 2012 Hellespont Project

Transcript of [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides....

Page 1: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

Treebanking in the World of ThucydidesLinguistic annotation for the Hellespont Project

Francesco Mambrini

Center For Hellenic Studies

Deutsches Archäologisches Institut

November 20 2012

Hellespont Project

Page 2: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

Outline

1 What digital corpora for Ancient History?The questions at handData-driven approaches

2 Linguistic Annotation of Thucydides 1.98-118The Hellespont ProjectExamples

Hellespont Project

Page 3: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The questions at handData-driven approaches

Outline

1 What digital corpora for Ancient History?The questions at handData-driven approaches

2 Linguistic Annotation of Thucydides 1.98-118The Hellespont ProjectExamples

Hellespont Project

Page 4: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The questions at handData-driven approaches

A web of knowledge

Figure: A simplified model

Hellespont Project

Page 5: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The questions at handData-driven approaches

Interconnectedness: the problem

The multivalent nature of historical thought [. . . ]eludes the keyword-indexed approach to the Webtoday on offer through Google and other searchengines. Though we can summon up an exhaustivelist of Web resources that contain the words “Gallipoli”and “sources”, today’s Web cannot effectively respondto a basic historical question such as, “which sourcesattest the Gallipoli Campaign of World War I?”

B. Robertson

Hellespont Project

Page 6: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The questions at handData-driven approaches

CIDOC Conceptual Reference Model

Objects represented as being part of events

Figure: by Doer and Stead 2009

Hellespont Project

Page 7: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The questions at handData-driven approaches

One more problem!Know what our sources are!

big and complex works; e.g. Thucydides:6.126 sentences, 167.512 wordsca 30 years of war, + 50 years in digression, references thatgo back to before the Trojan War!

Unstructured natural languageWritten in Ancient GreekControversial (interpretation and textual reconstruction)Literary work (= shaped by discursive and ideologicalstrategies)

Hellespont Project

Page 8: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The questions at handData-driven approaches

Outline

1 What digital corpora for Ancient History?The questions at handData-driven approaches

2 Linguistic Annotation of Thucydides 1.98-118The Hellespont ProjectExamples

Hellespont Project

Page 9: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The questions at handData-driven approaches

Ontologiemodellierung für die Erforschung vonRitualstrukturen (SBF 619, Heidelberg)

Figure: Event extraction from texts

Hellespont Project

Page 10: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The questions at handData-driven approaches

NLP Pipeline

NLP Process Ancient Greek?

Chunking

Lemmatization

POS-tagging

Syntactic parsing

Word-sense disambiguation

Co-reference resolution

Semantic role annotation

Hellespont Project

Page 11: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The questions at handData-driven approaches

Using and Enhancing the available resourcesThe Ancient Greek Dependency Treebank

AGDT: treebank with word-by-word morphological anddependency-based syntactical description

a step forward: semantic information

Hellespont Project

Page 12: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The questions at handData-driven approaches

A syntactic treeThuc. 1.89.1

Hellespont Project

Page 13: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The Hellespont ProjectExamples

Outline

1 What digital corpora for Ancient History?The questions at handData-driven approaches

2 Linguistic Annotation of Thucydides 1.98-118The Hellespont ProjectExamples

Hellespont Project

Page 14: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The Hellespont ProjectExamples

A case studyAthens, 479-431 BCE

Goal:Connecting textual and archaeological sources in thePerseus DL and Arachne via CIDOC-CRM

Steps:Enriching the text of one source (Thucydides) withlinguistic and historical informationIdentify and mark events on the text

manuallydata-driven approach

Integrating secondary literature (through data miningalgorithms)

Hellespont Project

Page 15: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The Hellespont ProjectExamples

Toward a 3-level scenarioMorphology and Syntax

Hellespont Project

Page 16: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The Hellespont ProjectExamples

Toward a 3-level scenario+ semantic and pragmatical information

Hellespont Project

Page 17: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The Hellespont ProjectExamples

Outline

1 What digital corpora for Ancient History?The questions at handData-driven approaches

2 Linguistic Annotation of Thucydides 1.98-118The Hellespont ProjectExamples

Hellespont Project

Page 18: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The Hellespont ProjectExamples

With tectogrammatical annotation:

Our text is:1 easier to browse for content-related search (easier to use

in digital environments)2 more informative on historically relevant questions

Hellespont Project

Page 19: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The Hellespont ProjectExamples

With tectogrammatical annotation:

Our text is:1 easier to browse for content-related search (easier to use

in digital environments)2 more informative on historically relevant questions

Hellespont Project

Page 20: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The Hellespont ProjectExamples

With tectogrammatical annotation:

Our text is:1 easier to browse for content-related search (easier to use

in digital environments)2 more informative on historically relevant questions

Hellespont Project

Page 21: [DCSB] Dr Francesco Mambrini (CHS, USA/DAI, Germany), "Treebanking in the World of Thucydides. Linguistic annotation for the Hellespont Project".

What digital corpora for Ancient History?Linguistic Annotation of Thucydides 1.98-118

The Hellespont ProjectExamples

Conclusions

1 Currently, our literary sources are not structured forsemantic, event-based queries

2 NLP processes for event extraction are not yet capable ofhandling raw Ancient Greek texts

3 NLP tools and techniques are adaptable to the taskprovide standardshelp and speed manual annotation(incidentally) they add a lot of information on linguisticaspects of the documentary sources

Hellespont Project