The Mint Mapping tool and the MoRe aggregator
Transcript of The Mint Mapping tool and the MoRe aggregator
The Mint Mapping toolThe MoRe aggregator
Vassilis Tzouvaras, Dimitris Gavrilis
National Technical University of AthensDigital Curation Unit - IMIS, Athena Research Center
LoCloud is funded by the European Commission's ICT Policy Support Programme
Cultural Heritage Content
• Diversity of cultural heritage content– Numerous metadata schemas to annotate content
(LIDO, CIDOC-CRM, EAD, METS ) • Massive digitization and annotation activities are in
progress• Need for interoperability
MINT Mapping Tool
• Provides users the ability to perform a mapping of their own metadata schemas to reference domain models
• Follows a typical web based architecture• It was developed for ATHENA, but it is currently used
for EUScreen, CARARE, Judaica, ECLAP, DCA and Linked Heritage
MINT 2 – What’s new?
• The backend was reconstructed for better performance– File size for imports is extended
• The frontend was updated– New interface– Workflow is integrated in UI– Facilitated browsing of input and target schema
MORe Overall Architecture
Registry
Apache Cassandra cluster
Fedora-commons
Temporary storage
Vocabulary services
Storage
JMS logging
Messaging
Core services
Enrichment service management
Entity matching / NLP
Geocoding / Historic Place names
REST
External enrichment services
Publish service management OAI-PMH
RDF Store
Elastic Search
Archive
Cloud architecture
• De-centralized• Scalable• Four cloud environmets– Storage– Monitoring & logging– Core services deployment– Enrichment services deployment
Distributed
• Enrichment services run on:– Austria– Spain– Greece– Lithuania– Slovenia– Norway
• Scalability can be facilitated through a virtualization infrastructure
Workflow
OAI-PMH
LoCloud Collections
Wikimedia
MINT
Harvest
Ingest
Transform Enrich
Publish
OAI-PMH
Archive
RDF Store
SolR
Validate Index
Delete Reject
Omeka
Intermediate Schemas
Dublin Core
LIDO
CARARE
EAD
ESE
EDM
Dublin Core
LIDO
CARARE
EAD
ESE
EDM
OMEKA-XML
OGD
• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing
Core services
Harvests content from metadata sourcesOAI-PMH repositoryMINTLoCloud CollectionsWikimedia
Multiple schemas are supportedOAI_DCCARARECARARE 2.0LIDOEADEDMESE
• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing
Core services
Validates incoming information packagesExecutes validation schemesValidation micro-services
StructureSchemaLinkingSchematron rules
Flexible
How it is used in MoRe:Pre-validation Post-validation
• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing
Core services
Ingest content into storageUses storage layer APIPluggable drivers for attaching different technologies / repositories
Apache CassandraFilesystem-basedFedora-commons
Versioning supportComplex digital object support
• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing
Core services
Content Model
Digital objects comprise data streams
Each data stream can hold any kind of information• XML/RDF, Image, Video, Documents, etc.
Each different representation of an information object is stored as a different data stream
Each curation action generates a new version• Transformation, Enrichment
• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing
Core services
Transforms entire information packages into the Europeana Data Model (EDM), or any other schema
Multiple transformation routinesPer schemaPer projectPer provider
User can attach rights statement
• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing
Core services
The generic enrichment service facilitates the execution of the enrichment micro-services
• Hides the complexity from the user by using enrichment plans
• Provides seamless integration with the UI of MORE
Virtual Enrichment driver• Allows developers/creative industries to create
their own enrichment services and declare/use them within MoRe
• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing
Core services
Preview the XML record information for all datastreams
Preview the record in HTML (using the Europeana style sheet)
• Harvesting• Validation• Ingestion• Transformation• Enrichment• Previewing• Publishing
Core services
Publish transformed / enriched information• Internal OAI-PMH provider• XML export • Publish directly to RDF repositories
• Sesame• Virtuoso
• SolR index server
• Thematic– Thesauri collections– Vocabulary matching– Background links
• Spatial– Geo normalization– Geo coding– Reverse geo-coding– Historic place names
• Other– Language identification
Enrichment micro-services
SKOS Thesauri
Geo-Names
DBPedia
Wikipedia
Enrichment Plan
• Enrichment micro-services are used within enrichment workflows: – Enrichment plans
• Each enrichment plan applies to a specific schema
• Each enrichment plan executes enrichment micro-services in a specific order
Enrichment plans
Language identification
Vocabulary matching
Geo-normalization
Geo-coding
Enrichment Plan
• Each enrichment plan defines run-time parameters for specific services– Content based
Enrichment plans
Language identification
Vocabulary matching
Geo-normalization
Geo-coding
Add subject collection A only if term X or Y
are matched