On metadata for Open Data
-
Upload
yannis-charalabidis -
Category
Technology
-
view
350 -
download
4
description
Transcript of On metadata for Open Data
![Page 1: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/1.jpg)
On Metadata for Open Data
Yannis Charalabidis
25.04.2012
![Page 2: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/2.jpg)
Introduction
We will try in the next slides to show you what is the level of expectation from metadata
handling from a 2nd generation open data system
![Page 3: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/3.jpg)
Imagine you are in front of the ENGAGE system, and you have your URI from a dataset,
somewhere in the cloud,(copied as string in the clipboard)
And begin …
![Page 4: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/4.jpg)
Prescreening: User only gives URI of the dataset
Enter (paste) the URI of your dataset
_
![Page 5: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/5.jpg)
(then for 30 seconds you see this screen, changing)
Progress of ENGAGE Resource Prescreening: ( 45% ) of jobs completed
Managed to : Identify xls file
Autofill, provisionally: TitleAutofill, provisionally: CreatorCreate unique ENGAGE URI
Calculate keywordsAutofill, provisionally: keywords
……
![Page 6: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/6.jpg)
(When finishing import, the report)Report
ENGAGE managed to automatically, provisionally fill in ( 21 ) of 43 metadata attributes for your dataset.
Your current validity is at ( 45% )
For your dataset to be inserted in the database, you need to continue filling
in ( 5 ) mandatory attributes.Your dataset will then be inserted with validity ( 55% )
If all ( 17 ) non-mandatory attributes are filled in, validity will be maximum, at
70% / limit of the insertion phase.
Please select next action: Continue ParkContinue Park CancelCancel
![Page 7: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/7.jpg)
After import …
… and then, we enter the metadata insertion page with pre-filled data, etc.
When we finish, we get a similar final report.
AND NOW THE ENGAGE METADATA set, that makes all that a possibility:
![Page 8: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/8.jpg)
But,before, some semantics:
Attribute characteristics – notation:
(M) : attribute is Mandatory (cannot be empty)(*) : attribute takes values from a controlled list of terms (codelist), or tree (dag of terms), or table (+) : takes values from an extendible list or tree. User may extend the list during insertion(a) : an auto-filling list (as suggestion) or otherwise automatically calculated attribute(m) : attribute accepts multiple values(v) : attribute entry can be verified through a type-checking algorithm
(( x )) : x is possible, but as an optionno tag : attribute is a simple string entry
---------- for the future -------------(c0), (c1), (c2), (c3) : the importance of attribute in completeness calculation (c3 is higher – mostly important)(q0), (q1), (q2), (q3) : the importance of attribute in data quality calculation (q3 is higher – mostly important)
![Page 9: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/9.jpg)
A. The core attributesMetadata Attribute Type of Attribute Type of codelist
Size of codelist (nodes)
Existing codelists
TitleAutomatic: extracted from the dataset headline of the URI/dataset provided
(M) ((a)) String - - -
PublisherPUB admin tree (100 per country, extendible)
(M)(*)(+)Pointer to Tree Tree of Strings 100 X
countryGreece (ENG)
Creator PUB admin tree (100 per country, extendible)Prompt: same as the publisher
(M)(*)(+)Pointer to Tree Tree of PS entities 100 X
countryGreece (ENG)
CodeAutomatic: ENGAGE automatic classification system (date,country,PSector,type,etc) or ENGAGE URI
(M)(*)(a) String - - -
UserThe user who uploads that. Automatic filling from table of users / login
(*)(a)Pointer to Table Table of Users -
-
![Page 10: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/10.jpg)
B. The outer core attributes Metadata Attribute Type of Attribute Type of codelist
Size of codelist (nodes)
Existing codelists
SubjectText describing the resource in one sentenceIt can be stored in a list and reused
(M)(*)(+)Pointer to List List of strings All resource
subjectsNO
Type List of types: dataset, linkable dataset, visualization, textual information, executable binary, unknown
(M)(*)(m)Pointer to list List of strings 10 ENG
Format xls xml odata … jpd pdf … (appr. 50 format types) (M)(*)(+)
Pointer to listList of strings 50 ENG
Language ISO simplified (5 < 20 (EU) < ISO (3000). Automatic: extract from language settings (when XLS / ISO)
(M)(*) ((a)) (m)Pointer to List List of strings 200 ISO List
(ENG)
Country 5 ENGAGE countries < rest of 27 EU < other countries ISO country list
(M)(*)(m)Pointer to List List of strings 200
ISO List (ENG)
![Page 11: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/11.jpg)
C. The Public Sector ContextMetadata Attribute Type of Attribute Type of codelist
Size of codelist (nodes)
Existing codelists
Public Sector DomainTree of sectors (20: finance, health, social security, etc)Automatic : can be calculated from Creator, if all public sector entities have a domain
(*)(m)(+)Pointer to Tree Tree of strings 20 ENG, GR
Relative Public Service List of public services (i2010 20 basic services, plus “other-reward service”, “othr permission service”, “Other registry entry service”, “Other personal documents service”)
(*)(m)(+)Pointer to List List of strings 24 ENG, GR
Relative Information SystemList of EU and national main information systems (50+50*country)
(*)(m)(+) Pointer to List List of strings 200 GR
Legal Framework Main EU directives on open data (10), main national laws and decrees on open data (10 X country)
(*)(m)(+) Table of Legal Elements 100 GR
![Page 12: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/12.jpg)
D. The Scientific ContextMetadata Attribute Type of Attribute Type of codelist
Size of codelist (nodes)
Existing codelists
Scientific Sector ENGAGE Tree of Scientific Domains
(*)(m)Pointer to Tree Tree of strings 100 Science
Scientific Usage of ResourceENGAGE tree of scientific types/usages: events data (nature or man-made), financial data, health data, etc (20)
(*)(m)(+)Pointer to Tree Tree of strings 20 Science
Intended AudienceList of possible audiences: citizens, enterprises, researchers, public sector managers, public sector officers, policy makers, members of National Parliament, MEP’s, NGO’s etc
(*)(m)(+)Pointer to List Tree of strings 20 ENGAGE
Keywords Initial list made / proposed by ENGAGE System with countries, Psector Domain, Science Domain, Usage. Also get from linked areas / domains / types etc
(*)(m)(+)(a)Pointer to List List of strings 200 -
![Page 13: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/13.jpg)
E. URL’s – URI’s - Links Metadata Attribute Type of Attribute Type of codelist
Size of codelist (nodes)
Existing codelists
Type of Source Link URL / URI / DOI / WS / RSS/ ENGAGE / other (*)(+)
Pointer to List List of Strings 10 ENG
Source Link (URL) String or ENGAGE URL (*)(a). Automatic: put the URL of ENGAGE site
(*) (+) ((a))Pointer to List List of Strings
Codelist is the full list of URI’s in ENGAGE
Yes
Type of Resource link URL / URI / DOI / WS / RSS/ ENGAGE other (*)(+)
Pointer to List List of Strings 10 ENG
Resource Link String or ENGAGE (a). Automatic lists the link it already has.
(*) (+) ((a))Pointer to List List of Strings
Codelist is the full list of URI’s in ENGAGE
Yes
Relevant Resources List of existing URI’s in the system . Automatic: calculates from matching domain+type+ (*)(m)(+)(a) List of Strings
Codelist is the full list of URI’s in ENGAGE
Yes
![Page 14: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/14.jpg)
F. Linked DataMetadata Attribute Type of Attribute Type of codelist
Size of codelist (nodes)
Existing codelists
Linking statusLinkable, linked, non-linked, non-linkable, unknown
(*)Pointer to List List of Strings 5 YES
Linked Data SetURI of a linked dataset. Details of link:
(*)(m)(+)(a)(d)Pointer to List List of URI’s No limit -
Linking Type (PK match) Pointer to List List of Strings 1 -
Matching column of this resource String - - -
Matching column of linked resource String - - -
Columns of this resource, to be included (m) String - - -
Columns of linked resource, to be included (m) String - - -
VisualisationsLinks to visualisations of current resource
(*)(m)(+)(a)(d)Pointer to List List of URI’s No limit -
![Page 15: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/15.jpg)
G. Dates and StatusMetadata Attribute Type of Attribute Type of codelist
Size of codelist (nodes)
Existing codelists
Consideration Started on (v)DATE - - -
Initial Approval / Planning Started on (v)DATE - - -
Planned to be valid on (v)DATE - - -
Validity Started on (v)DATE - - -
Validity to finish on (v)DATE - - -
Rejected on (v)DATE - - -
Substituted on (v)DATE - - -
Status Considered, planned, valid, valid and linked, rejected, outdated, substituted. Automatic: calculation through DATES
(*) (a) Pointer to List List of Strings 8 ENG
![Page 16: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/16.jpg)
H. RatingMetadata Attribute Type of Attribute Type of codelist
Size of codelist (nodes)
Existing codelists
Metadata CompletenessAutomatic: calculated by filled / empty non mandatory items
Number (1-100) - - -
Metadata QualityAutomatic: calculated by specific filled / empty non mandatory items Number (1-100) - - -
Citizen RatingAs reported / calculated by relative users Number (1-100) - - -
Researcher RatingAs reported / calculated by relative users Number (1-100) - - -
Business RatingAs reported / calculated by relative users
Number (1-100)
Number of DownloadsAs reported by the ENGAGE System Number - - -
Density of DownloadsAs number per total period of validity to date Number % - - -
![Page 17: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/17.jpg)
An Infrastructure for Open, Linked Governmental Data Provision towards
Research Communities and Citizens
Proposal Evaluation HearingBrussels 23/2/2011
Not to forget: Metadata codelists where there, since the Hearing … !
![Page 18: On metadata for Open Data](https://reader036.fdocuments.in/reader036/viewer/2022070321/558bd55ed8b42aa8158b45e1/html5/thumbnails/18.jpg)
Q6: Which types of metadata will you select?
• Exploit work already done by the consortium (DELFT, NTUA, AEGEAN, STFC) in public sector metadata schemas
• Multi-facet design: take under consideration the fact that the data may be used in different contexts, such as research, policy making or by citizens
• Take under consideration the fact that data sources may provide wildly differing metadata – go towards metadata standardisation for Open Data / a major contribution of ENGAGE
• Two-phase metadata design within ENGAGE workplan (Task C1.2: Data and knowledge representation annotation and linking methods). Initial proposal based on Dublin Core, UK eGovernment Metadata Schema and eGMS+, is as following:
Metadata ENGAGE Set Identifier Title CreatorPublisher Country SourceType (*) Format (*) Language (*)Sector (*) Subject (*) Keywords (*)Relative Public Service (*) Relative Information System URL / URI / DOIValidity Date (from – to) Audience (*) Legal FrameworkStatus (*) Relevant Resources Linkded Data Sets (*)
(*) Indicates Controlled Lists / Taxonomies