The Archaeotools project, faceted classification and natural language processing in an...
-
Upload
whitney-miller -
Category
Documents
-
view
214 -
download
0
Transcript of The Archaeotools project, faceted classification and natural language processing in an...
![Page 1: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/1.jpg)
The Archaeotools project, faceted The Archaeotools project, faceted classification and natural language processing classification and natural language processing in an archaeological context.in an archaeological context.
University of York, April 2008
![Page 2: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/2.jpg)
AHRC-EPSRC-JISC eScience research grants scheme:AHRC-EPSRC-JISC eScience research grants scheme:
AIM: To allow archaeologists to discover, share and analyse datasets and legacy publications which have hitherto been very difficult to integrate into existing digital frameworks
BUILDS UPON: Common Information Environment Enhanced Geospatial browser
PARTNERS: Natural Language Processing Research Group, Department of Computer Science, University of Sheffield
Joint Information Systems Committee
![Page 3: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/3.jpg)
![Page 4: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/4.jpg)
![Page 5: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/5.jpg)
![Page 6: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/6.jpg)
![Page 7: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/7.jpg)
![Page 8: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/8.jpg)
• Workpackage 1 - Advanced Faceted Classification /Geo-spatial Workpackage 1 - Advanced Faceted Classification /Geo-spatial browser – 1m+ records; 4 primary facets (What, Where, When browser – 1m+ records; 4 primary facets (What, Where, When and Media).and Media).
• Workpackage 2 – Natural language processing /Data-mining of Workpackage 2 – Natural language processing /Data-mining of Grey Literature; plus taggingGrey Literature; plus tagging
• Workpackage 3 – Data-mining of Historic Literature; plus Workpackage 3 – Data-mining of Historic Literature; plus geoXwalkgeoXwalk
Three distinct Workpackages:
![Page 9: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/9.jpg)
• Datasets include:– National Monuments Records (Scotland, Wales, England)– Excavation Index (EH)– Archive Holdings– Local Authority Historic Environment Records
• Thesauri include:– Thesaurus of Monuments Types (TMT)– Thesaurus of Object Types – MIDAS Period list– UK Government list of administrative areas, County,
District, Parish (CDP) – Not MIDAS
Work package 1Work package 1
![Page 10: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/10.jpg)
OracleRDBMS
MIDAS XML Record
Information Extraction RDF Resource
Knowledge triple store
XML Docs of Thesaurus
Query
User Interface
Information Extraction
When, Where, What ontologiesas entries to faceted index
Input
Input
![Page 11: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/11.jpg)
“WHAT”
• Records that have no subject information
• Records that use terms not found in TMT, so these records cannot be indexed (6,442 unique terms)
Records (1,001,407)
19,269 records (2%)
Records (1,001,407)
101,507 records (10.1%)
![Page 12: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/12.jpg)
“WHEN”
• Records that have no temporal information
• Records that use period terms not found in MIDAS so these records cannot be indexed (457 types of irresolvable dates)
Records (1,001,407)
292,793 records (29.2%)
Records (1,001,407)
114,505 (11.4%)
1066, 1001-1100,11th Centuary, C11, 11C, Eleventh Century
![Page 13: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/13.jpg)
“WHERE”
• Records that have no spatial information
• Records that use terms not found in CDP, so these records cannot be indexed.
Records (1,001,407)
11,126(1.1%)
Records (1,001,407)
245,601 records (24.5%)
![Page 14: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/14.jpg)
![Page 15: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/15.jpg)
![Page 16: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/16.jpg)
![Page 17: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/17.jpg)
![Page 18: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/18.jpg)
linear
![Page 19: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/19.jpg)
• Workpackage 1 - Advanced Faceted Classification /Geo-spatial Workpackage 1 - Advanced Faceted Classification /Geo-spatial browser – 1m+ records; 4 primary facets (What, Where, When browser – 1m+ records; 4 primary facets (What, Where, When and Media).and Media).
• Workpackage 2 – Natural language processing /Data-mining of Workpackage 2 – Natural language processing /Data-mining of Grey Literature; plus taggingGrey Literature; plus tagging
• Workpackage 3 – Data-mining of Historic Literature; plus Workpackage 3 – Data-mining of Historic Literature; plus geoXwalkgeoXwalk
Three distinct Workpackages:
![Page 20: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/20.jpg)
XML tagging of semantic content
CIDOC: CRM
![Page 21: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/21.jpg)
University Researchers
Local authority curators
![Page 22: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/22.jpg)
![Page 23: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/23.jpg)
![Page 24: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/24.jpg)
![Page 25: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/25.jpg)
• Workpackage 1 - Advanced Faceted Classification /Geo-spatial Workpackage 1 - Advanced Faceted Classification /Geo-spatial browser – 1m+ records; 4 primary facets (What, Where, When browser – 1m+ records; 4 primary facets (What, Where, When and Media).and Media).
• Workpackage 2 – Natural language processing /Data-mining of Workpackage 2 – Natural language processing /Data-mining of Grey Literature; plus taggingGrey Literature; plus tagging
• Workpackage 3 – Data-mining of Historic Literature; plus Workpackage 3 – Data-mining of Historic Literature; plus geoXwalkgeoXwalk
Three distinct Workpackages:
![Page 26: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/26.jpg)
![Page 27: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/27.jpg)
![Page 28: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/28.jpg)
![Page 29: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/29.jpg)
http://ads.ahds.ac.uk/project/archaeotools/
![Page 30: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/30.jpg)
![Page 31: The Archaeotools project, faceted classification and natural language processing in an archaeological context. University of York, April 2008.](https://reader036.fdocuments.in/reader036/viewer/2022062803/56649f435503460f94c63acc/html5/thumbnails/31.jpg)