Web Semantics with Jahia
-
Upload
thomas-delerm -
Category
Technology
-
view
282 -
download
3
description
Transcript of Web Semantics with Jahia
www.sigma.frwww.sigma.fr
SEMANTIC WEB WITH JAHIA
February 2014
www.sigma.frwww.sigma.fr
SUMMARY
www.sigma.fr
• WHY ?• Background
• Web 2.0 is not enough
• WHAT ?• Definitions
• It’s real
• HOW ?• JAHIA fits
• Integration
www.sigma.frwww.sigma.fr
WHY ?
www.sigma.fr
• Background
• Web 2.0 is not enough
www.sigma.fr
Thomas Delerm and Adrien Di Mascio from Logilab will explain the interest of web semantics in modern web applications for the best use of your data.
They’ll give the recipes that make Jahia an appropriate CMS for the semantic and linked data web, a.k.a. "web 3.0"
Adrien DI MASCIO - Semantic Web Director
Company : Logilab
Thomas DELERM - Web Architect
Company : SIGMA
Worked in cell and IPTV content startups
Background : who we are ?
www.sigma.fr
Web « 1 » was about documents and links
Web « 2.0 » is about social and users
https://web.archive.org/web/19991116151216/http://www4.yahoo.com/
How the web evolved
www.sigma.frwww.sigma.fr
WHY ?
www.sigma.fr
• Background
• Web 2.0 is not enough
www.sigma.fr
Failures of Web 2.0
All the databases and APIs are in “silo” searches are limited
Results are documents, not objects
Are my results up to date and reliable ?Example : Renault : Too many combinations when you want to buy a car : more than 10^20 [1]
[1] http://www.semweb.pro/talk/2474
www.sigma.fr
Failures of Web 2.0
Web 2.0 is far from perfect :
User tag– Different orthography– Different meanings for the
same orthography (Hollande)– No relationships between
tags
You cannot (in one request) answer complex queries like “List on my website 10 products whose producer is Samsung and price under $50”
www.sigma.fr
We have a solution
There is always a technical evolution
– From PC to Web : WWW and links
– From Web to Web 2.0 : AJAX (dynamic web sites)
– From Web 2.0 to Web 3.0 : Semantic properties and Linked data
So let’s learn what the semantic web is !
www.sigma.frwww.sigma.fr
WHAT ?
www.sigma.fr
• Definitions
• It’s real
www.sigma.fr
Semantic Web – (Anti)definitions
Today, Semantic Web is not:
Magic
Natural Language Processing
Image Automatic Processing
A new protocol
It's a worldwide network of data built upon a set of interoperable standards that
use URLs to identify data and link them together.
www.sigma.fr
No Natural Language Processing
<h1>Semantic Web</h1> <p>Semantic Web is worldwide network of data invented by <a
href="http://w3.org/People/Berners-Lee">Tim Berners Lee</a> in 1994.</p>
<h1> ????????????</h1> <p> ??????????????????????????????????????????????????
?????<a href="http://w3.org/People/Berners-Lee"> ???????????????</a> ????????</p>
A human reads:
A machine reads:
www.sigma.fr
If only ...
SemanticWeb is_a network
SemanticWeb was_created_by TimBernersLee
SemanticWeb was_created_in 1994
… The machine could read:
www.sigma.fr
Annotate your document
<p itemtype="Concept">
<span itemprop="name">Semantic Web</span> is
<span itemprop="description">worldwide network of data</span> invented by
<a itemprop="creator" href="http://w3.org/People/Berners-Lee">
Tim Berners Lee</a>
in <span="creation_date">1994</span>.</p>
Use rdfa or schema.org
www.sigma.fr
Publish another representation
<http://mysite.com/SemanticWeb>
a <http://www.w3.org/2004/02/skos/core#Concept>;
skos:closeMatch <http://data.bnf.fr/ark:/12148/cb119328992> ;
dc:creator <http://w3.org/People/Berners-Lee/> ;
dc:date "1994".
More familiar with JSON ? Take a look at JSON-LD
Publish RDF and use HTTP content-negotiation
www.sigma.fr
Vocabularies, ontologies
An ontology is a structured set of terms and concepts.
Each term and concept is also identified by a URL
There are quite a few standard ontologies for various domains (social interactions, libraries, music, events, etc.)
www.sigma.fr
Make it happen now !
RDF is nice
Some database engines store RDF graphs- You can query them with the SPARQL language
Standardized by W3C
You don't necessarily need to change your technology stack
If your data is structured, publishing RDF is easy- Choosing an ontology or a vocabulary can be hard- Make your relational database answer a SPARQL query is hard
www.sigma.frwww.sigma.fr
WHAT ?
www.sigma.fr
• Definitions
• It’s real
www.sigma.fr
It's all about data
Helps search enginesBetter indexationBetter page rank
Eases external data integrationImporting a CSV file requires a preliminary agreement on its structureMaintaining data is expensive, reuse published data (dbpedia, freebase, geonames)
Publishing structured data:
www.sigma.fr
Examples
GoodRelations annotations
Schema.org annotations
www.sigma.frwww.sigma.fr
HOW ?
www.sigma.fr
• Jahia fits
• Integration
www.sigma.fr
One goal : use state-of-the art Semantic Web since they are a library (Bibliothèque Publique d’information)
3 main needs: – Input data easily for contents and within contents– Store data in a safe, RDF-friendly manner– Output data
• On every page for SEO (RDFa)• In searches• In exports (RDF)
Good news : Jahia fits !
Client case : Bpi
www.sigma.fr
Input :
- Jahia allows to define clear content definitions (CND files) with inheritance.
- Jahia is content-centric
Enrich within contents : CKEditor
On contents : contribution or edition (GWT) modes
The choice of Jahia
www.sigma.fr
Storage : you need a framework than can abstract different sources of data : enter JCR
– Unique repository for all content– External data are abstract : LDAP, Files, other DB…
Output:– Graph structure + XML format fit for meta data – JSP views can be easily tailored for special export formats
The choice of Jahia : storage and output
www.sigma.frwww.sigma.fr
HOW ?
www.sigma.fr
• Jahia fits
• Integration
www.sigma.fr
Make sure text data is stored as plain HTML
- Properties file to map schema.org HTML code
- In-content schema.org properties Created a CKEditor Plugin
Triple categorization of contents–Categories (closed list)–Tags (open)–Authorities (closed – linked with BnF)
Next steps–Need for a triple store ?–Categorization through automatic spider browsing ?
Input : CKEditor and categories
www.sigma.fr
Directories per category
The semantic mapping is transparent : no additional field to fill in
Properties files to map a field and its semantic exports (Dublin Core, FOAF..)
Kind of challenges met
– Where to store meta data of a file extend jnt:file
– How to create a sub content while creating its parents edit Spring GWT XML
Content structure
www.sigma.fr
Page Schema.org OpenGraph Dublin Core FOAF
Lists No No No NoDetails on short and long contents
Yes Yes Yes Partial
Details : events, IT resource [file]
Yes No Yes No
Auteurs No No Yes YesPlace
In HTML Everywhere Header Header Everywhere
Format in HTML RDFa Meta Meta RDFa
In RDF Yes Yes, one line per meta
Yes, one line per meta
Yes, native
Contributed By Automatic +
Manual BpiAutomatic (mapping)
Automatic (mapping)
Automatic (mapping)
Vocabularies used
www.sigma.fr
We chose RDFa because more widely used for now (than microdata)
Debate : shall enrichment be made manually ? Automatically ? Though a mixed technology ?
The field dc:xxx mapping will be used to improve search results
“ARK” URIs are used to exchange objects between repositories (internal, Jahia, external like BnF)
Output
www.sigma.fr
Free your data ! Put them together Share them between applications and
externally
Forces you to organize your IT differently
Future
www.sigma.fr
Facebook is gradually promoting the posts that contain Opengraph data [1]
« Facebook testing more uses for Open Graph » [2]
[1] http://newsroom.fb.com/News/787/News-Feed-FYI-What-Happens-When-You-See-More-Updates-from-Friends(January 21, 2014)
[2] http://allfacebook.com/add-to-my-movies-link_b128387
Future : Facebook
www.sigma.fr
Future : Web 3.0
www.sigma.fr
Conclusion
“If you’re not paying for it, you are the product” [1]
Semantic Web is going to be imposed by internet giants because they need it to know you better
Make the first step to enrich your data, don’t miss the train !
Jahia 7 catches it :– External data provider– Quality, extendable editor
[1] http://blogs.law.harvard.edu/futureoftheinternet/2012/03/21/meme-patrol-when-something-online-is-free-youre-not-the-customer-youre-the-product/
www.sigma.fr
Webography:
New W3C Blog on Semantic Web & linked data : http://www.w3.org/blog/data/
http://fr.slideshare.net/AntidotNet/time2-market-lyon-13nov2013-slideshare#
http://fr.slideshare.net/terraces/technologies-du-web-smantique-pour-lentreprise-20
http://fr.slideshare.net/AntidotNet/web-smantique-web-de-donnes-web-30-linked-data-quelques-repres-pour-sy-retrouver
Questions & Answers