Web Semantics with Jahia

34
www.sigma.f r www.sigma.f r SEMANTIC WEB WITH JAHIA February 2014

description

State of the art of adding web semantics and linekd data in a CMS like Jahia 6.6 Presentation made at the JahiaOne event in Feb 2014 with A.Di Mascio (Logilab)

Transcript of Web Semantics with Jahia

Page 1: Web Semantics with Jahia

www.sigma.frwww.sigma.fr

SEMANTIC WEB WITH JAHIA

February 2014

Page 2: Web Semantics with Jahia

www.sigma.frwww.sigma.fr

SUMMARY

www.sigma.fr

• WHY ?• Background

• Web 2.0 is not enough

• WHAT ?• Definitions

• It’s real

• HOW ?• JAHIA fits

• Integration

Page 3: Web Semantics with Jahia

www.sigma.frwww.sigma.fr

WHY ?

www.sigma.fr

• Background

• Web 2.0 is not enough

Page 4: Web Semantics with Jahia

www.sigma.fr

Thomas Delerm and Adrien Di Mascio from Logilab will explain the interest of web semantics in modern web applications for the best use of your data.

They’ll give the recipes that make Jahia an appropriate CMS for the semantic and linked data web, a.k.a. "web 3.0"

Adrien DI MASCIO - Semantic Web Director

Company : Logilab

Thomas DELERM - Web Architect

Company : SIGMA

Worked in cell and IPTV content startups

Background : who we are ?

Page 5: Web Semantics with Jahia

www.sigma.fr

Web « 1 » was about documents and links

Web « 2.0 » is about social and users

https://web.archive.org/web/19991116151216/http://www4.yahoo.com/

How the web evolved

Page 6: Web Semantics with Jahia

www.sigma.frwww.sigma.fr

WHY ?

www.sigma.fr

• Background

• Web 2.0 is not enough

Page 7: Web Semantics with Jahia

www.sigma.fr

Failures of Web 2.0

All the databases and APIs are in “silo” searches are limited

Results are documents, not objects

Are my results up to date and reliable ?Example : Renault : Too many combinations when you want to buy a car : more than 10^20 [1]

[1] http://www.semweb.pro/talk/2474

Page 8: Web Semantics with Jahia

www.sigma.fr

Failures of Web 2.0

Web 2.0 is far from perfect :

User tag– Different orthography– Different meanings for the

same orthography (Hollande)– No relationships between

tags

You cannot (in one request) answer complex queries like “List on my website 10 products whose producer is Samsung and price under $50”

Page 9: Web Semantics with Jahia

www.sigma.fr

We have a solution

There is always a technical evolution

– From PC to Web : WWW and links

– From Web to Web 2.0 : AJAX (dynamic web sites)

– From Web 2.0 to Web 3.0 : Semantic properties and Linked data

So let’s learn what the semantic web is !

Page 10: Web Semantics with Jahia

www.sigma.frwww.sigma.fr

WHAT ?

www.sigma.fr

• Definitions

• It’s real

Page 11: Web Semantics with Jahia

www.sigma.fr

Semantic Web – (Anti)definitions

Today, Semantic Web is not:

Magic

Natural Language Processing

Image Automatic Processing

A new protocol

It's a worldwide network of data built upon a set of interoperable standards that

use URLs to identify data and link them together.

Page 12: Web Semantics with Jahia

www.sigma.fr

No Natural Language Processing

<h1>Semantic Web</h1> <p>Semantic Web is worldwide network of data invented by <a

href="http://w3.org/People/Berners-Lee">Tim Berners Lee</a> in 1994.</p>

<h1> ????????????</h1> <p> ??????????????????????????????????????????????????

?????<a href="http://w3.org/People/Berners-Lee"> ???????????????</a> ????????</p>

A human reads:

A machine reads:

Page 13: Web Semantics with Jahia

www.sigma.fr

If only ...

SemanticWeb is_a network

SemanticWeb was_created_by TimBernersLee

SemanticWeb was_created_in 1994

… The machine could read:

Page 14: Web Semantics with Jahia

www.sigma.fr

Annotate your document

<p itemtype="Concept">

<span itemprop="name">Semantic Web</span> is

<span itemprop="description">worldwide network of data</span> invented by

<a itemprop="creator" href="http://w3.org/People/Berners-Lee">

Tim Berners Lee</a>

in <span="creation_date">1994</span>.</p>

Use rdfa or schema.org

Page 15: Web Semantics with Jahia

www.sigma.fr

Publish another representation

<http://mysite.com/SemanticWeb>

a <http://www.w3.org/2004/02/skos/core#Concept>;

skos:closeMatch <http://data.bnf.fr/ark:/12148/cb119328992> ;

dc:creator <http://w3.org/People/Berners-Lee/> ;

dc:date "1994".

More familiar with JSON ? Take a look at JSON-LD

Publish RDF and use HTTP content-negotiation

Page 16: Web Semantics with Jahia

www.sigma.fr

Vocabularies, ontologies

An ontology is a structured set of terms and concepts.

Each term and concept is also identified by a URL

There are quite a few standard ontologies for various domains (social interactions, libraries, music, events, etc.)

Page 17: Web Semantics with Jahia

www.sigma.fr

Make it happen now !

RDF is nice

Some database engines store RDF graphs- You can query them with the SPARQL language

Standardized by W3C

You don't necessarily need to change your technology stack

If your data is structured, publishing RDF is easy- Choosing an ontology or a vocabulary can be hard- Make your relational database answer a SPARQL query is hard

Page 18: Web Semantics with Jahia

www.sigma.frwww.sigma.fr

WHAT ?

www.sigma.fr

• Definitions

• It’s real

Page 19: Web Semantics with Jahia

www.sigma.fr

It's all about data

Helps search enginesBetter indexationBetter page rank

Eases external data integrationImporting a CSV file requires a preliminary agreement on its structureMaintaining data is expensive, reuse published data (dbpedia, freebase, geonames)

Publishing structured data:

Page 20: Web Semantics with Jahia

www.sigma.fr

Examples

GoodRelations annotations

Schema.org annotations

Page 21: Web Semantics with Jahia

www.sigma.frwww.sigma.fr

HOW ?

www.sigma.fr

• Jahia fits

• Integration

Page 22: Web Semantics with Jahia

www.sigma.fr

One goal : use state-of-the art Semantic Web since they are a library (Bibliothèque Publique d’information)

3 main needs: – Input data easily for contents and within contents– Store data in a safe, RDF-friendly manner– Output data

• On every page for SEO (RDFa)• In searches• In exports (RDF)

Good news : Jahia fits !

Client case : Bpi

Page 23: Web Semantics with Jahia

www.sigma.fr

Input :

- Jahia allows to define clear content definitions (CND files) with inheritance.

- Jahia is content-centric

Enrich within contents : CKEditor

On contents : contribution or edition (GWT) modes

The choice of Jahia

Page 24: Web Semantics with Jahia

www.sigma.fr

Storage : you need a framework than can abstract different sources of data : enter JCR

– Unique repository for all content– External data are abstract : LDAP, Files, other DB…

Output:– Graph structure + XML format fit for meta data – JSP views can be easily tailored for special export formats

The choice of Jahia : storage and output

Page 25: Web Semantics with Jahia

www.sigma.frwww.sigma.fr

HOW ?

www.sigma.fr

• Jahia fits

• Integration

Page 26: Web Semantics with Jahia

www.sigma.fr

Make sure text data is stored as plain HTML

- Properties file to map schema.org HTML code

- In-content schema.org properties Created a CKEditor Plugin

Triple categorization of contents–Categories (closed list)–Tags (open)–Authorities (closed – linked with BnF)

Next steps–Need for a triple store ?–Categorization through automatic spider browsing ?

Input : CKEditor and categories

Page 27: Web Semantics with Jahia

www.sigma.fr

Directories per category

The semantic mapping is transparent : no additional field to fill in

Properties files to map a field and its semantic exports (Dublin Core, FOAF..)

Kind of challenges met

– Where to store meta data of a file extend jnt:file

– How to create a sub content while creating its parents edit Spring GWT XML

Content structure

Page 28: Web Semantics with Jahia

www.sigma.fr

Page Schema.org OpenGraph Dublin Core FOAF

Lists No No No NoDetails on short and long contents

Yes Yes Yes Partial

Details : events, IT resource [file]

Yes No Yes No

Auteurs No No Yes YesPlace        

In HTML Everywhere Header Header Everywhere

Format in HTML RDFa Meta Meta RDFa

In RDF Yes Yes, one line per meta

Yes, one line per meta

Yes, native

Contributed        By Automatic + 

Manual BpiAutomatic (mapping)

Automatic (mapping)

Automatic (mapping)

Vocabularies used

Page 29: Web Semantics with Jahia

www.sigma.fr

We chose RDFa because more widely used for now (than microdata)

Debate : shall enrichment be made manually ? Automatically ? Though a mixed technology ?

The field dc:xxx mapping will be used to improve search results

“ARK” URIs are used to exchange objects between repositories (internal, Jahia, external like BnF)

Output

Page 30: Web Semantics with Jahia

www.sigma.fr

Free your data ! Put them together Share them between applications and

externally

Forces you to organize your IT differently

Future

Page 31: Web Semantics with Jahia

www.sigma.fr

Facebook is gradually promoting the posts that contain Opengraph data [1]

« Facebook testing more uses for Open Graph » [2]

[1] http://newsroom.fb.com/News/787/News-Feed-FYI-What-Happens-When-You-See-More-Updates-from-Friends(January 21, 2014)

[2] http://allfacebook.com/add-to-my-movies-link_b128387

Future : Facebook

Page 32: Web Semantics with Jahia

www.sigma.fr

Future : Web 3.0

Page 33: Web Semantics with Jahia

www.sigma.fr

Conclusion

“If you’re not paying for it, you are the product” [1]

Semantic Web is going to be imposed by internet giants because they need it to know you better

Make the first step to enrich your data, don’t miss the train !

Jahia 7 catches it :– External data provider– Quality, extendable editor

[1] http://blogs.law.harvard.edu/futureoftheinternet/2012/03/21/meme-patrol-when-something-online-is-free-youre-not-the-customer-youre-the-product/

Page 34: Web Semantics with Jahia

www.sigma.fr

Webography:

New W3C Blog on Semantic Web & linked data : http://www.w3.org/blog/data/

http://fr.slideshare.net/AntidotNet/time2-market-lyon-13nov2013-slideshare#

http://fr.slideshare.net/terraces/technologies-du-web-smantique-pour-lentreprise-20

http://fr.slideshare.net/AntidotNet/web-smantique-web-de-donnes-web-30-linked-data-quelques-repres-pour-sy-retrouver

Questions & Answers