Aggregation Workflow at Europeana Aggregator Forum

22
Aggregation workflow Dimitra Atsidis Aggregator Forum, 22-23 May, The Hague

description

An overview of the Europeana Aggregation Workflow presented by Dimitra Atsidis of the Europeana Ingestion Team

Transcript of Aggregation Workflow at Europeana Aggregator Forum

Page 1: Aggregation Workflow at Europeana Aggregator Forum

Aggregation workflow

Dimitra Atsidis

Aggregator Forum, 22-23 May,The Hague

Page 2: Aggregation Workflow at Europeana Aggregator Forum

Content

Publication policy

Potential partners process

Submission deadlines for new and existing providers

Europeana ingestion workflow

Acceptance criteria and Europeana validation

Guidance and help – Europeana pro

Future plans for Europeana aggregation workflow

Exercise – The ideal aggregation workflow

Page 3: Aggregation Workflow at Europeana Aggregator Forum

Publication Policy

Page 4: Aggregation Workflow at Europeana Aggregator Forum

Publication Policy

Clear criteria for acceptance or decline of metadata for publication and for take down of legacy metadata from the Europeana database

Ingestion workflow (deadlines, timelines, prioritisation)

Content scope (what is a digital object, kind of content)

Technical validation of metadata quality (expected values)

Metadata licensing (CC0)

Rights Statements for digital objects

• All digital objects with valid edm:rights

• PD objects labelled as PD

• edm:rights & dc:rights not contradictory

Page 5: Aggregation Workflow at Europeana Aggregator Forum

Publication Process and Workflow

Page 6: Aggregation Workflow at Europeana Aggregator Forum

How to become a data provider to Europeana?

Page 7: Aggregation Workflow at Europeana Aggregator Forum

New Provider Timeline

First delivery of data: samples or full datasets

Feedback on: - structure of the metadata

- mandatory elements- rights statements

Feedback taken into account: new delivery of datasets ready

to be ingested

Ingestion of the datasets that are compliant to the

publication policy

Publication in Europeana of the submitted datasets

Provider ProviderEuropeana Europeana Europeana

Between the 1st and the 5th of

month 1

After the 5th of month 1

Before the 21st of month 1

Before the 21st of month 2

Between the 10th and the 20th of

month 2

Between the 10th and the 20th of

month 3

TIMELINE

Before the 21st of month 3

Between the 10th and the 20th of

month 4Month 1

Page 8: Aggregation Workflow at Europeana Aggregator Forum

Around the 21st of month 1

Around the 1st of month 2

Between the 10th and the 20th of month 2

PROVIDER FOR EUROPEANA

EUROPEANA AGGREGATION TEAM

Delivers samples of metadata or full datasets for feedback

Sends feedback about: - structure of the metadata

- mandatory elements- rights statements

SUBMISSION DEADLINE: no new dataset will be accepted for

month 2 publication after the deadline

Delivers full datasets to be included in the coming

publication

Processes the datasets or/and sends feedback about:

- structure of the metadata- mandatory elements

- rights statements

Monthly ingestion complete Publication is finalized

If needed, delivers a corrected version of datasets ready to be

ingested

Ingests the datasets to be included in the coming

publication, provided that they are compliant to the

publication policy

Communicates that the publication is live and sends

final report for the processed datasets

1st of month 1

Processed datasets can be retrieved on the Europeana

portal

TIMELINE

Regular Ingestion Cycle Diagram Timeline

Page 9: Aggregation Workflow at Europeana Aggregator Forum

Europeana Ingestion WorkflowIn SugarCRM

populate with info about organizations

and datasets

In UIM Enrich Collection workflow (to index and

enrich dataset/s)

In Thumbler caching links to thumbnails (objects/previews)

SugarCRM (to control and set the correct Ingestion Status)

Content Contribution Form (for new partners)

Scheduling of Ingestion

Load dataset/s in UIM Harvest in REPOX

In MINT map dataset/s

In UIM Dereference Collection (if needed)

In UIM Create Record Redirects (if needed)

Dataset/s published in Europeana.euVia UIM load

dataset/s in MINT

Via MINT load mapped

dataset/s in UIM

Page 10: Aggregation Workflow at Europeana Aggregator Forum

Acceptance criteria

Page 11: Aggregation Workflow at Europeana Aggregator Forum

Acceptance criteria Completed and submitted the Data Exchange Information Form.

Data Exchange Agreement to Europeana

o Aggregators need to submit the signed Data Exchange Agreements of their data providers

o Aggregators can use template clauses for the agreement

between aggregators and data providers:

http://pro.europeana.eu/ensuring-permissions-for-aggregators

Metadata are accepted for publication after the feedback of the Europeana Operations Officers

o EDM schema and guidelines

o Rights labeling

Datasets are prioritised for publication if the edm:rights in the majority of the metadata of the dataset is PDM, CC0, CC BY or CC BY-SA

Datasets submitted via OAI-PMH protocol, FTP or file

Page 12: Aggregation Workflow at Europeana Aggregator Forum

Automatic validation:

Validation according to the EDM schema (or ESEv3.4)

Validation of the mandatory properties

Unique identifiers

oMetadata records that don’t meet this validation are discarded

oProviders can fix issues first and resubmit or let Europeana ingest the records that are valid, and fix the invalid records at a later stage

Validation of urls for thumbnail creation (ImageMagick)

Europeana validation

Page 13: Aggregation Workflow at Europeana Aggregator Forum

Applicable class Mandatory Properties (or alternatives)

Aggregation edm:dataProvider

Aggregation edm:isShownAt or edm:isShownBy

Aggregation edm:provider

Aggregation edm:rights

Aggregation edm:aggregatedCHO

Aggregation edm:ugc (when applicable)

ProvidedCHO dc:title or dc:description

ProvidedCHO dc:language for text objects

ProvidedCHOdc:subject or dc:type or dc:coverage or dcterms:spatial

ProvidedCHO edm:type

Mandatory properties

Page 14: Aggregation Workflow at Europeana Aggregator Forum

Validation by the operations officers:

Feedback is following to the EDM schema and guidelines

Check if links are working, are direct links of reasonable size

Recommendations to include thumbnails, geolocations, etc.

Feedback on (near) duplicate records, and taking the advantages of the EDM

Feedback on rights statements in edm:rights and dc:rights

Relations between the EDM classes

Correct use of vocabularies

Literals vs resources (e.g. a thumbnail always need to be a valid url)

Feedback on any other metadata quality related matters (duplication of properties, encoding in the data, wrongly mapped properties, etc.)

Etc, etc.

Europeana validation

Page 15: Aggregation Workflow at Europeana Aggregator Forum

Guidance and help

Page 16: Aggregation Workflow at Europeana Aggregator Forum

Guidance and help

Europeana Professional:http://pro.europeana.eu/provide-data

Content inbox – for all ingestion & metadata related matters [email protected]

Page 17: Aggregation Workflow at Europeana Aggregator Forum

Questions?

Page 18: Aggregation Workflow at Europeana Aggregator Forum

Future plans for aggregation workflow

Page 19: Aggregation Workflow at Europeana Aggregator Forum

Future plans for aggregation workflow The big plan is to open up part of the ingestion workflow to providers

• Providers can log-in, identify the aggregator/project they work for

• Providers can select the datasets they want to update, or add new datasets

• Providers can upload their data – protocols besides OAI-PMH and FTP are under discussion

• Providers can map their data to EDM, or edit data that is already EDM

• Providers can validate the data against the EDM schema and preview in a preview portal

Europeana wants to provide tools for uploading data, validating, mapping, and previewing

Other tools and workflows being considered: link checking, thumbnail caching, enrichment

Start with a test environment, to preview and validate subset of data before sending to Europeana

Eventually to open up part of the workflow of Europeana to providers, not only for test but to integrate in the ingestion workflow.

Page 20: Aggregation Workflow at Europeana Aggregator Forum

Future plans for aggregation workflow

Benefits for providers:

• Possibility to map to EDM

• Validation according to the EDM schema (with schematron rules we implemented)

• Previewing before publication

• Self service, less dependent on Europeana, saving time (you can do many steps yourself, and you spot errors earlier)

Benefits for Europeana:

• Scale up operations – amount of projects, aggregators and therefore datasets has grown exponentially in the last years

• To focus more on metadata quality and assisting providers as much as possible with EDM, modelling and metadata related questions

• Making the ingestion process transparent and more connected to the process at aggregators side

Page 21: Aggregation Workflow at Europeana Aggregator Forum

The ideal aggregation workflowConsider your own aggregator route from data provider, to the aggregator to data provision to Europeana

Consider also the current aggregation workflow of Europeana and the future plans presented

Now, draw the ideal workflow to get your data from the data provider, through your aggregator into Europeana. Make a diagram, a mindmap, or whatever comes to mind.

Think, for instance, about the following questions:

What steps in your current workflow could you use help with (e.g. mapping, validation, rights clearance)

Would you use any of the workflow steps Europeana plans to open up? Why, or why not?

Are there any tools you use already, you could recommend to everyone?

Would the aggregator or the data providers (or both) benefit and use the tools?

Use the yellow post-its to signal positive things, improvements, easy wins (and why?)

Use the pink post-it to signal forseeable issues, or difficulties (and why?)

Page 22: Aggregation Workflow at Europeana Aggregator Forum

Thank you!

Dimitra Atsidis

[email protected]