EDF2012 Mariana Damova - Factforge

23
FactForge FactForge Data Service and the Value of Inferred Knowledge Mariana Damova, PhD European Open Data Forum June 2012

Transcript of EDF2012 Mariana Damova - Factforge

Page 1: EDF2012   Mariana Damova - Factforge

FactForgeFactForge Data Service and the Value of 

Inferred Knowledge

Mariana Damova, PhD

European Open Data ForumJune 2012                             

Page 2: EDF2012   Mariana Damova - Factforge

Ontotext

– Top‐5 provider of core Semantic Technology– Established in year 2000; offices in Bulgaria, UK, USA– Active both in research and commercial projectsp j

• 360° semantic technology – unique portfolio:– Semantic Databases: high‐performance RDF DBMS, scalable reasoning

– Semantic Search: text‐mining (IE), metadata generation, Information Retrieval (IR)

– Web Mining: focused crawling, screen scraping, data fusion 

– Linked Data Management and Data Integration

Good recognition in the SemTech communityOntotext pages are ranked #1 for “semantic annotation” and “semantic repository” at– Ontotext pages are ranked #1 for  semantic annotation  and  semantic repository  at GYM, #3 for “linked data management” at Google

Several joint ventures and subsidiaries– Innovantage: leading online recruitment intelligence provider in UK

Page 3: EDF2012   Mariana Damova - Factforge

Ontotext Clients (selected)

British Broadcasting Corporation (BBC)– Run its World Cup 2010 sites on top of OWLIM– Since Mar’12 BBC Sports and 2012 Olympics sections are drivenSince Mar 12 BBC Sports and 2012 Olympics sections are driven 

by OWLIM and a Concept Extraction service developed by Ontotext

Press Association (UK)– Analysis of Sports newsAnalysis of Sports news– Concept extraction– Linked data generation

Top‐3 USA media (not allowed to name)Top‐3 USA media (not allowed to name)

The National Archives (UK) contracted Ontotext to implement semantic KB and semantic search for the Government Web Archive

British Museum (UK) Ontotext leads the development of Phase 3 of ResearchSpace project on collaborative research in cultural heritage; British Museum’s public SPARQL end‐point is powered by OWLIM

d Bibli h k (H ll d)de Bibliothek (Holland) aggregation of data from 150 library databases

Page 4: EDF2012   Mariana Damova - Factforge

Linked Open Data is maturingLinked Open Data is maturing

LOD cloud grows by billions of triples yearly

T h l i d id li b tTechnologies and guidelines about 

how to produce linked data fast 

h h lhow to assure their quality 

how to provide vertical oriented data services

LOD2, LATC, baseKB 

#4June2012European Data Forum

Page 5: EDF2012   Mariana Damova - Factforge

This talk is aboutThis talk is about 

reasoning 

dand 

coping with diversity of the data on the web of data 

#5June 2012European Data Forum

Page 6: EDF2012   Mariana Damova - Factforge

Outline

• FactForge (beta)

• Reference Layer

• Access Modes

• Querying – Airports around London– US city – a subject of a Novely j– US city – contactInformation

• Challenges

• Conclusion

European Data Forum

Page 7: EDF2012   Mariana Damova - Factforge

FactForge (beta)

the largest body of heterogeneous general knowledge on which inference has been performed

European Data Forum

g y g g g p

– powered by OWLIM 5.0 – supporting SPARQL 1.1

Page 8: EDF2012   Mariana Damova - Factforge

Datasets

REASON‐ABLE VIEWof LOD datasets

b f lNumber of explicit statements: 1,796,673,630Implicit statements: 1,3

Retrievable statements:  14,928,925,039

NY Times

CIA FactBookDBpedia 3.7

Freebase

Lexvo

LingvojGeonamesWordnet 3.0

Lexvo

LingvojMusicBrainz 

European Data Forum

materialization is performed with respect to the semantics of OWL-Horst optimized

Page 9: EDF2012   Mariana Damova - Factforge

Reference Layer

PROTON – light weight upper level ontology~500 classes, ~150 properties

Linking at schema level:(1) using rdfs:subClassOf and rdfs:subPropertyOf statements; (2) using OWL expressions where there is a difference in the conceptualization

, p phttp://www.ontotext.com/proton-ontology

#9June 2012European Data Forum

(2) using OWL expressions where there is a difference in the conceptualization(3) using inference rules if additional individuals are necessary in the repository to support the mapping

Page 10: EDF2012   Mariana Damova - Factforge

Access modes

RDF Search - retrieve ranked list of URIs related to literals, which contain specific keywords

#10June 2012European Data Forum

Page 11: EDF2012   Mariana Damova - Factforge

Access modes (condt)

Exploration ‐ traversing the data, one resource at a time 

Page 12: EDF2012   Mariana Damova - Factforge

Access modes (condt)

Exploration ‐ traversing the data, one resource at a time,  inspecting inferred knowledge 

- locatedIn – Denmark, Northern Europe- Geonames types/FearureCodes P.PPLyp /- parentFeature – Denmeark, Europe…

#12June 2012European Data Forum

Page 13: EDF2012   Mariana Damova - Factforge

Access modes (condt)

Exploration - traversing the data, one resource at a time, inspecting inferred knowledge

- locatedIn - Europe- subRegionOf - Europeg p- hasContactInfo –

website via Freebase- containsLocation…

#13June 2012European Data Forum

Page 14: EDF2012   Mariana Damova - Factforge

Access modes (condt)

SPARQL endpoint

#14June 2012European Data Forum

Page 15: EDF2012   Mariana Damova - Factforge

Access modes (condt)

RelFinder

#15June 2012European Data Forum

Page 16: EDF2012   Mariana Damova - Factforge

Querying

Using LOD concepts

SELECT * WHERE {?Person dbp‐ont:birthPlace ?BirthPlace ;

rdf:type dbp‐ont:Politician ;?BirthPlace geo ont:parentFeature dbpedia:Germany

Using the intermediary layer

?BirthPlace geo‐ont:parentFeature dbpedia:Germany .} 

g y y

SELECT * WHERE {?Person prot:birthPlace ?BirthPlace ;

rdf:type prot:Politicianr ;?BirthPlace prot:subRegionOf dbpedia:Germany .

}}

European Data ForumJune 2012

Page 17: EDF2012   Mariana Damova - Factforge

Find Airports near London

Standard LOD vs. PROTON query 13 vs. 20 resultsDBpedia vs DBpedia and GeonamesDBpedia vs. DBpedia and Geonames

#17June 2012European Data Forum

Page 18: EDF2012   Mariana Damova - Factforge

Find airports near London ‐ Results comparison

Using Geospatial index of OWLIM

#18June 2012European Data Forum

g p

Page 19: EDF2012   Mariana Damova - Factforge

City – a subject of a science fiction author

#19June 2012European Data Forum

Page 20: EDF2012   Mariana Damova - Factforge

OWLIM 5.0 and SPARQL 1.1

Exemplary queries :GROUP BY, minGROUP BY, min

— Minimal and maximal population counts of European countries

Federated Query between FactForge and LinkedLifeDataD th t th di f hi h di d Al d G h B ll— Drugs that cure the disease from which died Alexandre Graham Bell

Literal index over dates– World governors in office between 1980 and 2005

Literal index over digits― European countries with population above 20 MLN

Geospatial indexGeospatial index— Show the distance from London of airports located at most 50 miles away from it

#20June 2012European Data Forum

Page 21: EDF2012   Mariana Damova - Factforge

Challenges and usage

• Clean data – Clean up input data

• At model level – Contradiction detection– Consistency checkingy g

• Curation and upgrading methodology

FactForge has been used as data layer infrastructure in FP7 projects, like RENDERFactForge has been used as data layer infrastructure in FP7 projects, like RENDERFactForge has been used in tasks of

linked data generation from unstructured data,metadata enrichment of structured dataproviding linkages to the entire LOD cloud

#21June 2012European Data Forum

providing linkages to the entire LOD cloudfor example The National Archive

Page 22: EDF2012   Mariana Damova - Factforge

Acknowledgements

Partial funding

ColleaguesAtanas Kiryakov, CEO of OntotextZd k T h OZdravko Tashev, OntotextIvan Peikov, OntotextRouslan Velkov, OntotextKiril Simov, OntotextB Bi h OBarry Bishop, OntotextBarry Norton, OntotextMarin Dimitrov, OntotextAlex Simov, OntotextJ d Di h O t t t kJordan Dichev, OntotextKonstantin Penchev, Ontotext

Linkshttp://ff-dev.ontotext.comhttp://www.ontotext.com/owlimhttp://www.ontotext.com/factforgeE il

#22June 2012European Data Forum

Email:[email protected]

Page 23: EDF2012   Mariana Damova - Factforge

Thank you for your attention!

[email protected]