Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

52
DIQA Projektmanagement GmbH Pfinztalstraße 90 76227 Karlsruhe [email protected] Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint Daniel Hansch Shared Solutions Day – 20. Februar 2014

description

Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

Transcript of Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

Page 1: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

DIQA Projektmanagement GmbH

Pfinztalstraße 90

76227 Karlsruhe

[email protected]

Semantische Technologien

(nicht nur) für die verbesserte

Suche in SharePoint

Daniel Hansch

Shared Solutions Day – 20. Februar 2014

Page 2: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 2 DIQA Portfolio, January 2013

About DIQA GmbH

DIQA is an independent software vendor of knowledge management tools for ECM portals.

Our vision:

We provide our customers with services and products that turn their ECM

portals into smart portals by introducing semantic web technologies. Smart

portals let end-users better find, organize, process, control and govern

unstructured content.

Founded: 2012

Team: SharePoint, MediaWiki, knowledge management and semantic web specialists

Location: Germany, Karlsruhe

Page 3: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 3 DIQA Portfolio, January 2013

Agenda

• The Semantic Web • Vision, Goals

• Principles

• Base technologies

• Available data

• Applications: • BBC Semantic Publishing

• Google Knowledge Graph

• Facebook Open Graph

• Wikidata

• Using the Semantic Web in SharePoint

• Semantic Search in SharePoint

Page 4: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 4 DIQA Portfolio, January 2013

The Semantic Web

• Tim Berners-Lee’s vision of a semantic web: The Semantic Web isn't just about putting data on

the web. It is about making links, so that a person or

machine can explore the web of data. With linked

data, when you have some of it, you can find

other, related, data. http://www.w3.org/DesignIssues/LinkedData.html

• Note: We treat the terms as synonym:

• Semantic Web

• Web of Data

• Linked (Open) Data

Page 5: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 5 DIQA Portfolio, January 2013

Linked Data Principles

★ Available on the web (whatever format)

… with an open license, to be Open Data

★★ Available as machine-readable structured data (e.g.

excel instead of image scan of a table)

★★★ Available in a non-proprietary format (e.g. CSV

instead of excel)

★★★★ Using open standards from W3C (RDF and SPARQL) to

identify things, so that people can point at your stuff

★★★★★ Linked to other people’s data to provide context

Tim Berners Lee (2010): http://www.w3.org/DesignIssues/LinkedData.html

Page 6: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 6 DIQA Portfolio, January 2013

RDF Data Model

• Web of Data is based on

RDF data model

• RDF is a semi-structure graph data model

• Nodes and edges are

labeled with URIs

• Basic pattern (triple) • subject-predicate-object

• BusinessEntity1 offers Offering1

• UnitPriceSpec1 hasValue “200.0”

• RDF can be serialized in many formats, incl.

RDF/XML

http://www.heppnetz.de/projects/goodrelations/primer/images/fig1.png

Page 7: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 7 DIQA Portfolio, January 2013

Linked Data Cloud 2007

Source for this and the folllowing graphs: Linking Open Data cloud: Richard Cyganiak, Anja Jentzsch

Page 8: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 8 DIQA Portfolio, January 2013

Linked Data Cloud 2008

Page 9: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 9 DIQA Portfolio, January 2013

Linked Data Cloud 2009

Page 10: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 10 DIQA Portfolio, January 2013

Linked Data Cloud 2010

Page 11: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 11 DIQA Portfolio, January 2013

Linked Data Cloud 2011

Page 12: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 12 DIQA Portfolio, January 2013

Agenda

• The Semantic Web • Vision, Goals

• Principles

• Base technologies

• Available data

• Applications • BBC Semantic Publishing

• Google Knowledge Graph

• Facebook Open Graph

• Wikidata

• Using the Semantic Web in SharePoint

• Semantic Search in SharePoint

Page 13: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 13 DIQA Portfolio, January 2013

Linked Data Cloud 2011

Page 14: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 14 DIQA Portfolio, January 2013

BBC

Early adopter of the WoD („Linking open data project“), roles: • Data provider (program catalogue, artists)

• Data consumer (links to external resources about artists)

• Technology provider (similar to Thomson Reuters, Elsevier and NYT?)

Dynamic Semantic Publishing architecture • Semantic web technology stack to reduce curation effort for online media

production

• Challenge: BBC Sports sites for 2010 World cup, Olympic games: 700 index pages require curation, like links to story pages etc. and frequent updates.

• DSP replaces static publishing with dynamic aggregation that makes use of a metadata layer.

• Workflow:

• Editors author stories

• Stories are tagged (semi-)automatically

• Index pages are generated automatically and kept up-to-date through queries that use tags.

Benefit • Reduced effort for curation

• Deeper and broader access to BBC content

• Increased quality

Page 15: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 15 DIQA Portfolio, January 2013

BBC Wildlife Portal

Page 16: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 16 DIQA Portfolio, January 2013

BBC Wildlife Portal

Page 17: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 17 DIQA Portfolio, January 2013

BBC Wildlife Portal

Page 18: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 18 DIQA Portfolio, January 2013

Page 19: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 19 DIQA Portfolio, January 2013

Agenda

• The Semantic Web • Vision, Goals

• Principles

• Base technologies

• Available data

• Applications • BBC Semantic Publishing

• Google Knowledge Graph

• Facebook Open Graph

• Wikidata

• Using the Semantic Web in SharePoint

• Semantic Search in SharePoint

Page 20: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 20 DIQA Portfolio, January 2013

Google Knowledge Graph

• 2005 Google hires Guha (co-inventor of RSS and RDF)

• 2010 Google acquires Metaweb (developers of Freebase)

• 2011 Bing, Google and Yahoo! introduced Schema.org. • Goal: common set of schemas for structured data

markup on web pages • Based on ontologies and formal metadata

• Improve Search results

• 2012 Google starts enhancing search results with formal metadata from the Knowledge Graph • Based on wikipedia-crawls (~DBPedia)

• Freebase

• CIA World Factbook and more

• 2013 Google hires Denny Vrandecic (co-inventor of Semantic MediaWiki and Wikidata) …

Page 21: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 21 DIQA Portfolio, January 2013

Google Knowledge Graph

Page 22: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 22 DIQA Portfolio, January 2013

Facebook Open Graph

• Started as the Social Graph (friends)

• Now, every web-page/thing can become a node in the

Facebook Graph

• Social plugins on pages, e.g. Like

• Nodes can be linked with different kinds of edges

• Friend, Like, write, listen, eat, cook

• Graph API makes data readable and writable for Facebook

Apps

Page 23: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 23 DIQA Portfolio, January 2013

Wikidata

Page 24: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 24 DIQA Portfolio, January 2013

Wikidata in Wikipedia

Page 25: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 25 DIQA Portfolio, January 2013

Agenda

• The Semantic Web • Vision, Goals

• Principles

• Base technologies

• Available data

• Applications • BBC Semantic Publishing

• Google Knowledge Graph

• Facebook Open Graph

• Wikidata

• Using the Semantic Web in SharePoint • Semantic Search in SharePoint

Page 26: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 26 DIQA Portfolio, January 2013

Linked Data Cloud: Life Sciences Data

Page 27: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 27 DIQA Portfolio, January 2013

Other Sources for data in Life Sciences

• From the LOD cloud

• UniProt

• SIDER

• DrugBank

• PubMed

• GeneOntology

• PubChem

• ChEMBL

• KEGG Drug, Pathway,

Enzyme, Reaction, …

• …

• LinkedLifeData combines

• ChemBI

• DiseaseSome

• DrugBank

• EntrezGene

• GeneOntology

• NCI

• SIDER

• PubMed

• UMLS

• Uniprot

• …

http://linkedlifedata.com/

Page 28: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 28 DIQA Portfolio, January 2013

Use Linked Data from Uniprot to Filter SharePoint

Documents

Terms from Uniprot are used as

“Semantic Tags”. Each tags is associated

with an enzyme in Uniprot. This list of

documents is generated from a SPARQL-

query that returns all documents about

an enzyme, that has “Magnesium” as

cofactor.

Page 29: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 29 DIQA Portfolio, January 2013

SharePoint add-on from DIQA: GRASP

1) Linking Open Data cloud: Richard Cyganiak, Anja Jentzsch

1)

SPARQL

GRASP accesses SPARQL

endpoints from the web of data.

SharePoint 2010

GRASP

GRASP Visualizations in Web Browser

Read more about GRASP: http://www.diqa-pm.com/en/GRASP

Page 30: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 30 DIQA Portfolio, January 2013

Agenda

• The Semantic Web • Vision, Goals

• Principles

• Base technologies

• Available data

• Applications • BBC Semantic Publishing

• Google Knowledge Graph

• Facebook Open Graph

• Wikidata

• Using the Semantic Web in SharePoint

• Semantic Search in SharePoint: SharePoint Findability Solution

Page 31: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

DIQA Projektmanagement GmbH

Pfinztalstraße 90

76227 Karlsruhe

[email protected]

DIQA‘S SHAREPOINT FINDABILITY SOLUTION • TERMINOLOGY MANAGEMENT • AUTOMATIC DOCUMENT CLASSIFICATION • INTELLIGENT SEARCH

Page 32: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 32 DIQA Portfolio, January 2013

SharePoint Findability Solution: Features

1. Upload and manage terminologies in the “library of

ontologies” (e.g. SKOS and TBX/TermBase eXchange).

2. Load terminologies into term stores, groups or term sets.

3. Manage the terms in the terminology manager (e.g.

labels in different languages).

4. Manage the relations between terms including

associations and poly-hierarchies.

5. Create classification rules in order to automatically tag the

document corpus (requires Layer2 Autotagger).

6. Use the terminology to intelligently suggest search terms in the document search (Term Suggester).

7. Use the TreeView Refiner to drill-down or drill-up in the

search results.

8. The user is guided in the search process by the „Matching Terms“ and „Related Terms“ webparts.

Page 33: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 33 DIQA Portfolio, January 2013

Upload

terminologies (in

SKOS or TBX) and

manage them in

a library.

http://server/

1. Library of ontologies

Page 34: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 34 DIQA Portfolio, January 2013

http://server/

2. Select the term

store and the

update strategy.

1. Select a

terminology or

taxonomy to

populate a term

store…

2. Load terminologies into the termstore

Page 35: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 35 DIQA Portfolio, January 2013

Manage term

labels in different

languages,

descriptions, …

3. Manage terms

Page 36: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 36 DIQA Portfolio, January 2013

4. Manage relations between terms

Add terms that

are related to this

term…

Page 37: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 37 DIQA Portfolio, January 2013

Manage multiple

parent terms (poly

hierarchy)…

4. Manage relations between terms

Page 38: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 38 DIQA Portfolio, January 2013

…pick parent

terms from the

tree browser.

4. Manage relations between terms

Page 39: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 39 DIQA Portfolio, January 2013

Inspect the full

term hierarchy in

the TreeBrowser.

4. Manage relations between terms

Page 40: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 40 DIQA Portfolio, January 2013

5. Define classification rules

If a document

satisfies this rule

then it is tagged

with a specific

term.

Page 41: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 41 DIQA Portfolio, January 2013

Validate the rule

before it is used to

analyze your

entire document

corpus.

5. Define classification rules

Page 42: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 42 DIQA Portfolio, January 2013

5. Tag documents automatically

Entire SharePoint

content is tagged

automatically

based on the

classification rules.

Page 43: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 43 DIQA Portfolio, January 2013

6. Search terms are intelligently suggested

The Term Suggester

Webpart supports

the user while he is

typing in his search

query…

Page 44: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 44 DIQA Portfolio, January 2013

6. Search terms are intelligently suggested

…the intelligent

matching algorithm

suggests terms from

the terminology that

contain parts of the

search query in

labels and

synonyms.

Page 45: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 45 DIQA Portfolio, January 2013

7. Term-tree to navigate in search results

TreeView Refiner

Webpart extends

the standard refiner

webpart and

visualises the terms

in the context of the

term-tree.

Page 46: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 46 DIQA Portfolio, January 2013

7. Term-tree to navigate in search results

Users can select

terms in the term-

tree to drill down or

drill up in the search

results.

Page 47: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 47 DIQA Portfolio, January 2013

7. Term-tree to navigate in search results

Search results

are updated

as you

navigate in the

term tree.

Page 48: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 48 DIQA Portfolio, January 2013

8. Matching terms guide the user in the search process

Pick a new search

term from the list of

matching terms

and resume the

search.

Page 49: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 49 DIQA Portfolio, January 2013

Advantage over standard SharePoint-Search

1. Superior managed metadata for content classification

2. Integrated taxonomies from various sources

3. Reliable automatic document-tagging

4. Users find documents immediately despite unknown

taxonomy

5. Users are guided in the search process

6. The terms contained in the search results are presented in

their taxonomic context

7. Users can easily drill-up or drill-down in the tree to broaden

or narrow the search

Page 51: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

© 2013 DIQA Projektmanagement GmbH | www.diqa-pm.com | Slide 51 DIQA Portfolio, January 2013

Take Home Message

• Semantic Web

• Open standards for publishing structure data

(graph knowledge)

• Vast number of available data sources

• DIQA makes this knowledge accessible in

SharePoint

• Metadata is one key benefit of SharePoint

Stop searching, start finding: the "SharePoint

Findability" solution from DIQA provides reliable

products and a proven method to find

documents quicker and more efficiently.

Page 52: Semantische Technologien (nicht nur) für die verbesserte Suche in SharePoint

DIQA Projektmanagement GmbH Pfinztalstraße 90 76227 Karlsruhe

[email protected]

Visit us on http://www.diqa-pm.com Thank you for your attention!