Online Learning and Linked DataLessons Learned and Best Practices
Dataset Profiling
7. April 2023 1Besnik Fetahu
LinkedUp: Data Catalog Features
34 linked datasets of educational relevance (http://datahub.io/dataset?organization=linked-education)
VoID representations of datasets include the following information:
Manual dataset schema alignments
Accessibility information, i.e. SPARQL endpoint URL
7. April 2023 2Besnik Fetahu
http://purl.org/ontology/bibo/Thesis owl:equivalentClass http://purl.org/ontology/bibo/Thesishttp://swrc.ontoware.org/ontology#Article owl:equivalentClass http://purl.org/ontology/bibo/AcademicArticle
http://data.linkededucation.org/linkedup/dataset/data-open-ac-uk void:sparqlEndpoint http://data.open.ac.uk/queryCo-occurence graph of data types in 146 datasets: 144 Vocabularies, 588 highly overlapping types, 719 Properties
Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013.
LinkedUp: Data Catalog Features
34 linked datasets of educational relevance (http://datahub.io/dataset?organization=linked-education)
VoID representations of datasets include the following information:
Datasets’ resources type graph
Datasets’ Topic Extraction (Dataset Profiling)
7. April 2023 3Besnik Fetahu
morelab
OpenCourseWare
LinkedUp: Data Catalog Features
34 linked datasets of educational relevance (http://datahub.io/dataset?organization=linked-education)
VoID representations of datasets include the following information:
Federated query interface:
7. April 2023 4Besnik Fetahu
PREFIX void: <http://rdfs.org/ns/void#> PREFIX aiiso: <http://purl.org/vocab/aiiso/schema#>
SELECT DISTINCT ?endpoint WHERE{ ?ds void:sparqlEndpoint ?endpoint. {{ ?ds void:classPartition [void:class aiiso:School] } UNION{?ds void:subset [void:classPartition [void:class
aiiso:School]] }} }
LinkedUp: Why dataset profiling?
7. April 2023 5Besnik Fetahu
Few linked dataset characteristics (from Linked Open Data Cloud).
Growing number of datasets: 227 datasets
Data represented as triples: 31 billion triples
Multi-lingual content: 18 languages
Broad set of topics covered
Inter-dataset links
Domain Number of datasets Triples % (Out-)Links
Media 25 1,841,852,061 5.82 % 50,440,705 Geographic 31 6,145,532,484 19.43 % 35,812,328
Government 49 13,315,009,400 42.09 % 19,343,519
Publications 87 2,950,720,693 9.33 % 139,925,218
Cross-domain 41 4,184,635,715 13.23 % 63,183,065
Life sciences 41 3,036,336,004 9.60 % 191,844,090 User-generated content
20 134,127,413 0.42 % 3,449,143
295 31,634,213,770
503,998,829
Domains covered by “lod-cloud” datasets
LinkedUp: Why dataset profiling?
7. April 2023 6Besnik Fetahu
Domain Number of datasets Triples % (Out-)Links
Media 25 1,841,852,061 5.82 % 50,440,705 Geographic 31 6,145,532,484 19.43 % 35,812,328 Government 49 13,315,009,400 42.09 % 19,343,519 Publications 87 2,950,720,693 9.33 % 139,925,218 Cross-domain 41 4,184,635,715 13.23 % 63,183,065 Life sciences 41 3,036,336,004 9.60 % 191,844,090 User-generated content
20 134,127,413 0.42 % 3,449,143
295 31,634,213,770
503,998,829
How do I find information about “renewable energy”?
31 billion resources
18 languages 180 organisations
How can we do that?
Check datasets that cover such topic?
Use SPARQL filter clause?
What are all possible forms of renewable energy?
38 out of 228 datasets contain topic coverage informationregex(*) filter clause needs to check all triples that contain a specific keyword
renewable energy: solar energy, wind energy, geothermal…...
LinkedUp: How to profile Linked Data?
7. April 2023 7Besnik Fetahu
What is a linked data profile?
Linked Dataset profiles consist of structured information describing their topic coverage. A profile is represented as a graph. The vertices in the profile graph consist of datasets, resources, and topics. The edges of the profile graph are constructed between the tuples ‹dataset, resources› and ‹resources, topics›. Finally, edges between resources and topics are weighted conveying the relevance of a topic for a dataset.
Profile Definition
<resource_uri_1> ?predicate_x value
<resource_uri_1> ?predicate_y value
<resource_uri_1> ?predicate_z value
A dataset consists of a set of resource instances.
A resource is represented by a set of triples.
A topic is equivalent to a DBpedia category, associated to one of the resource values.
<resource_uri_1>
<resource_uri_2>
……<resource_uri_n>
Linked-Up: Profiling Linked Data
7. April 2023 8Besnik Fetahu
i. Metadata extraction
ii. Sampling of resource instances
iii. Entity and topic extraction
iv. Topic ranking (PageRank with Priors, HITS
with Priors and K-Step Markov)
v. Weighted dataset-topic profile graphs
vi. Profiles representation
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles. Besnik Fetahu, Stefan Dietze, Bernardo Pereira Nunes, Marco Antonio Casanova, Davide Taibi, and Wolfgang Nejdl. In Proceedings of the 11th Extended Semantic Web Conference, Springer, 2014 (to appear).
Profiling Linked Data – (I)
7. April 2023 9Besnik Fetahu
i. Metadata extraction:
DataHub’s CKAN API
ii. Sampling of resource instances
weighted, random, centrality
iii. Entity and topic extraction
Consider only the textual values assigned to a resource
NER: Disambiguate and extract named entities (DBpedia Spotlight)
Profiling Linked Data – (II)
7. April 2023 10Besnik Fetahu
i. Topic ranking (PageRank with Priors, HITS with Priors and K-Step Markov)
Rank topics for each dataset, and compute their relevance w.r.t the
associated resources
ii. Weighted dataset-topic profile graph
The computed topic weights for each dataset, represent the weights for the
edges <dataset, topic>
iii. Profiles representation (Vocabulary of Interlinked Datasets (VoID) and Vocabulary
of Links (VoL))
VoID: Captures information about a Linked Dataset as a set of links
VoL : Defines a link (of entity or topic type), along with the provenance
information and the relevance score of such link
Profiling Linked Data: Representation Example
7. April 2023Besnik Fetahu 11
Dataset Profile Metadata
Dataset’s Profile and Index
Entity Type Link
extracted entity
extracted topic
Provenance information (resources) for the entity link
Provenance information (entities) for the topic link
Topic Type Link
topic relevance score
SELECT ?dataset ?link ?score ?link_1 ?entity ?resource WHERE {?dataset a void:Linkset.?dataset vol:hasLink ?link.?link vol:linksResource <http://dbpedia.org/resource/Category:Renewable_energy>.?link vol:derivedFrom ?entity.?link vol:hasScore ?score.?link_1 vol:linksResource ?entity.?dataset vol:hasLink ?link_1.?link_1 vol:derivedFrom ?resource } ORDER BY DESC(?score)
7. April 2023Besnik Fetahu 12
How are the profiles useful?
• “Renewable Energy” is in different forms:• Solar Energy• Wind-farms• Biogas• Hydroelectricity etc.
http://enipedia.tudelft.nl/wiki/Windmar_Renewable_Energy
http://enipedia.tudelft.nl/data/page/eGRID/Plant/57050
http://enipedia.tudelft.nl/wiki/Us_Energy_Biogas_Corp
http://www.reegle.info/profiles/JP
How do I find information about “renewable energy”?
Profiling Linked Data: Evaluation
7. April 2023Stefan Dietze 13
Profiling accuracy for the different ranking approaches using the full sample of analysed resource instances, and with NDCG score averaged over all datasets.
The correlation between ranking accuracy (averaged over all datasets and for ∆NDCG ) and ranking time.
Profiling Linked Data: Example use cases
7. April 2023Besnik Fetahu 14
Type specific views on datasets/categories “Document” (foaf:document) “Person “ (foaf:person) “Course” (aaiso:course)
LinkedUp Catalog only (as schema mappings already available here)
Exploratory functionalities over the dataset profiles
Available for LinkedUp catalog and the LOD-Cloud.
Online Learning and Linked DataLessons Learned and Best Practices
Cite4Me and Linked Challenge
7. April 2023Besnik Fetahu 15
Semantic Search and Retrieval of Publications
7. April 2023Besnik Fetahu 16
Semantic SearchGraph Search
Paper RecommendationIn-depth Analysis
Cite4Me: A Semantic Search and Retrieval Web Application for Scientific Publications. Bernardo Pereira Nunes, Besnik Fetahu, Stefan Dietze, and Marco Antonio Casanova. Proceedings of the 12th International Semantic Web Conference, Sydney, Australia, (2013)
LinkedUp: Veni Challenge
7. April 2023Besnik Fetahu 17
DataConf.
KnowNodes
Mismuseos
ReCredible
YourHistory
7. April 2023
http://www.globe-town.org/
WeShare - 3rd price / people‘s choice
GlobeTown - 2nd price
http://seek.cloud.gsic.tel.uva.es/weshare/
http://www.polimedia.nl/
PoliMedia – 1st price
Demos and Other Resources
7. April 2023Besnik Fetahu 18
Cite4Me: A Semantic Search and Retrieval Web Application for Scientific Publications. Bernardo Pereira Nunes, Besnik Fetahu, Stefan Dietze, and Marco Antonio Casanova. Proceedings of the 12th International Semantic Web Conference, Sydney, Australia, (2013)
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles. Besnik Fetahu, Stefan Dietze, Bernardo Pereira Nunes, Marco Antonio Casanova, Davide Taibi, and Wolfgang Nejdl. In Proceedings of the 11th Extended Semantic Web Conference, Springer, 2014 (to appear).
Assessing the Educational Linked Data Landscape, D’Aquin, M., Adamou, A., Dietze, S., ACM Web Science 2013 (WebSci2013), Paris, France, May 2013.
LinkedUp Catalog: http://data.linkededucation.org/linkedup/catalog/
DevTalk LinkedUp: http://data.linkededucation.org/linkedup/devtalk/
LOD Profile Data: http://data-observatory.org/lod-profiles/sparql
LOD Profile Explorer: http://data-observatory.org/lod-profiles/profile-explorer
Cite4Me Application: http://www.cite4me.com/
LinkedUp Challenge: http://linkedup-challenge.org/
Top Related