Ihr june15-evans
-
Upload
historyspot -
Category
Education
-
view
202 -
download
0
Transcript of Ihr june15-evans
NLP and Data Mining: From Chartex to Traces Through Time
and beyond
Dr Roger EvansNatural Language Technology Group &
Cultural Informatics Research GroupUniversity of Brighton
ChartEx Architecture
1000’s of chartersVirtual
workbenchData
mining
Natural language
processing
DM development
NLP development
5-10 charters
Markupscheme
Expertelicitation
100-200 Charters Marked-up chartersManual markup
ChartExrepository
VWB development
VWB requirements
Repositorydevelopment
ChartEx Architecture
1000’s of chartersVirtual
workbenchData
mining
Natural language
processing
DM development
NLP development
5-10 charters
Markupscheme
Expertelicitation
100-200 Charters Marked-up chartersManual markup
ChartExrepository
VWB development
VWB requirements
Repositorydevelopment
Runtime architecture
TTT architecture
Record Linkage
Visualisation
Shallow language
processing
Extract content
Deep language
processing
DocumentsOptimisation
/statistics
Comparison
Record Linkage
Visualisation
Shallow language
processing
Extract content
Deep language
processing
DocumentsOptimisation
/statistics
1000’s of chartersVirtual
workbenchData
mining
Natural language
processing
ChartExrepository
Comparison
Record Linkage
Visualisation
Shallow language
processing
Extract content
Deep language
processing
DocumentsOptimisation
/statistics
1000’s of chartersVirtual
workbenchData
mining
Natural language
processing
ChartExrepository
Range of data
Medieval charters
English and Latin
Early and modern
Free text
Text and data
Comparison
Record Linkage
Visualisation
Shallow language
processing
Extract content
Deep language
processing
DocumentsOptimisation
/statistics
1000’s of chartersVirtual
workbenchData
mining
Natural language
processing
ChartExrepository
Range of data
Analytic Complexity
Medieval charters
English and Latin
Early and modern
Free text
Text and data
Focus on people
Detailed view
Focus on places
Broad relational view
Comparison
Record Linkage
Visualisation
Shallow language
processing
Extract content
Deep language
processing
DocumentsOptimisation
/statistics
1000’s of chartersVirtual
workbenchData
mining
Natural language
processing
ChartExrepository
Range of data
Target users
Analytic Complexity
Medieval charters
English and Latin
Early and modern
Free text
Text and data
Focus on people
Detailed view
Focus on places
Broad relational view‘Researchers’
Controlled environment
Web users
Less control
Comparison
Record Linkage
Visualisation
Shallow language
processing
Extract content
Deep language
processing
DocumentsOptimisation
/statistics
1000’s of chartersVirtual
workbenchData
mining
Natural language
processing
ChartExrepository
Range of data
Target users
Analytic Complexity
Medieval charters
English and Latin
Early and modern
Free text
Text and data
Focus on people
Detailed view
Focus on places
Broad relational view‘Researchers’
Controlled environment
Web users
Less control
(Heritage) Enterprise
Bespoke
What can Computer Science do?
• State of the art is broadly based on statistics
• Answers are always only approximate
• Different kinds of approximation:• Precision – focus on making sure answers are right (but
may miss some)
• Recall - focus on getting as many right answers as possible (but may give some wrong answers too)
What does Digital Humanities want?
• Perfect results? • How do you respond if we say we can’t do that?
• Control over tradeoff?• How easy is it to understand what control you have?
• Does this help you interpret the results you get?
Where are we now, and where are we going?• Human in the loop
• Tools always require human interpretation of results
• Is this really just a cop out by computer scientists?
• Or just a pragmatic expression of the state of the art?
• Deskilling• Do we really mean an expert in the loop?
• Conversations• Are we really only just at the point of negotiating what is
possible and what is required?