Mariana Damova - Ontotext
-
Upload
digitised-manuscripts-to-europeana -
Category
Documents
-
view
1.090 -
download
1
Transcript of Mariana Damova - Ontotext
OWLIM
Mariana Damova, PhD
DM2EVienna, November 2012
Ontotext
– Top-5 provider of core Semantic Technology
– Established in year 2000; offices in Bulgaria, UK, USA
– Active both in research and commercial projects (FP7 funding for 10 years)
• 360° semantic technology – unique portfolio:
– Semantic Databases: high-performance RDF DBMS, scalable reasoning
– Semantic Search: text-mining (IE), metadata generation, Information Retrieval (IR)
– Web Mining: focused crawling, screen scraping, data fusion
– Linked Data Management and Data Integration
Good recognition in the SemTech community
– Ontotext pages are ranked #1 for “semantic annotation” and “semantic repository” at
GYM, #3 for “linked data management” at Google
Several joint ventures and subsidiaries
– Innovantage: leading online recruitment intelligence provider in UK
Ontotext Clients (selected)
British Broadcasting Corporation (BBC)– Run its World Cup 2010 sites on top of OWLIM
– Since Mar’12 BBC Sports
– 2012 Olympics sections are driven by OWLIM and a Concept Extraction service developed by Ontotext
Press Association (UK)– Analysis of Sports news
– Concept extraction
– Linked data generation– Linked data generation
Top-3 USA media (not allowed to name)
The National Archives (UK) contracted Ontotext to implement semantic KB and semantic search for the Government Web Archive
British Museum (UK) Ontotext leads the development of Phase 3 of ResearchSpace project on collaborative research in cultural heritage; British Museum’s public SPARQL end-point is powered by OWLIM
de Bibliothek (Holland) aggregation of data from 150 library databases
Semantic Technologies
• Semantic technologies (RDF, LOD) allow for an unprecedented ease of
integration of heterogeneous data sources
– Already adopted in pharmaceuticals and publishing industries
– Cultural heritage is the next
BBC – when MySQL was replaced with OWLIM in their “Dynamic Semantic
Publishing” architecture, the BBC team observed considerable reduction of Publishing” architecture, the BBC team observed considerable reduction of
complexity of database design, query specification, application
development, and query evaluation time. BBC World Cup 2010 dynamic
semantic publishing. Jem Rayfield, Senior Technical Architect BBC News
and Knowledge.
http://www.bbc.co.uk/blogs/bbcinternet/2010/07/bbc_world_cup_2010_dyna
mic_sem.html
OWLIM
• OWLIM is a family of scalable semantic repositories
• OWLIM-Lite: in-memory, fastest, scales to ~100 million statements
• OWLIM-SE: file-based, sameAs & query optimizations, scales to 20 billion
statements
• OWLIM-Enterprise: replication cluster deployment for resilience and high
performance parallel query-answering
Semantic Repository for RDFS and OWL
• OWLIM provides
– Management, integration and analysis of heterogeneous data
– Combined with light-weight, high-performance reasoning
– The inference is based on logical rule-entailment
– Full RDFS, OWL Horst, restricted OWL-Lite, OWL2-QL and OWL2 RL
– Custom semantics can be defined via rules and axiomatic triples
OWLIM in the Cultural Heritage Domain
Selected commercial projects
ResearchSpace project funded by the Andrew W. Mellon Foundation Support for collaborative web-based research, information sharing and web publishing for the cultural heritage scholarly community. An Ontotext-led international consortium.
The Polish Digital National Museum aggregates artifacts from over 70 contributing cultural institutions in the Digital Libraries Federation PIONIER Network using OWLIM repository of Ontotext
LODAC (Linked Open Data in Academia), Japan's National Institute of Informatics aggregates various information across multiple Japanese resources as LOD. The system uses 8 OWLIM nodes and aggregates 19 collections with 700 000 entities and 15M triples. uses 8 OWLIM nodes and aggregates 19 collections with 700 000 entities and 15M triples.
SemTech for Cultural Heritage project funded by ITCCSemantic publishing of Bulgarian cultural heritage to Europeana Establishing a Bulgarian technical aggregator for Europeana
Selected research projects
MOLTO FP7 project, a use case in cultural heritage for a semantic knowledgerepresentationinfrastructure for querying RDF and presenting query results, includes close to 9K museum objects from two collections of The Gothenburg City
Charisma (Cultural Heritage Advanced Research Infrastructures) an EU-funded integrating activity project, a consortium of 21 partners, metadata from 6 major European cultural institutions has selected OWLIM repository of Ontotext
OWLIM PERFORMANCE
• OWLIM is a scalable, robust and efficient triple store
– Serving the two most important web-sites for the London Olympic Games• Official Olympics website
• BBC Olympics website
– Performance highlights• OWLIM loads the 100M and the 200M datasets almost twice as fast as the next best product
(17 min. for 100M)(17 min. for 100M)
• Best query performance among those repositories that can handle update and multi-client
query tasks (5,285 Query-mixes-per-hour, where a query mix contains 25 queries; e.g. about
100 queries/sec)
• OWLIM v5 is 43% faster than v.4.3 on the BSBM Explore and Update scenario
• OWLIM v5 requires between 25% and 70% less storage space
• OWL 2 RL-type languages have proven to be the only feasible approach for
reasoning with billion statements
Reasoning complexity
owl:sameAs Optimization
a way to handle the equivalent statements by a single master node,
which has as an impact efficient and compact handling of inferred
statements resulting in 4-6 times more statements available to query
than the explicitly introduced ones
OWLIM Replication Cluster
• Distribution through data replication is used to ensure:
– Better handling of concurrent user requests
– Failover support
• How does it work?
– Every user request is pushed in a transaction queue
– Each data write request is are multiplexed to all repository instances– Each data write request is are multiplexed to all repository instances
– Each read request is dispatched to one of the
instance only
– To ensure load-balancing, each
read requests is send to the
instance with smallest execution
queue at this point in time
Geo-spatial index
• Geo-spatial information concerns the geometry of points, shapes and distances relative to the surface of the Earth (or any spherical object).
• When using OWLIM-SE all angles are in decimal degrees with the latitude ranging from -90 to +90 degrees and the longitude ranging from -180 to +180 degrees.
• airports have a reference point given by latitude, longitude and altitude; • political boundaries can be specified by polygons where each vertex is a 2-Dimensional latitude/longitude pair.
• OWLIM-SE includes a plug-in that allows for efficient
calculation of a modification of PageRank over RDF graphs
• Computation of rank values is fast, e.g.
– 400M LOD statements takes 310 sec (27 iteraions)
• Results are available through a system predicate
RDF Rank
• Results are available through a system predicate
• Example: get the 100 most important nodes in the RDF graph
SELECT ?n {?n rank:hasRDFRank ?r}
ORDER BY DESC(?r) LIMIT 100
Define: nested repositories
”Nested repositories” represent a new data
management concept for RDF data:
• a mechanism for sharing data stored across
multiple repositories, where
• one of them contains a large body of
knowledge which gets embedded in other
repositoriesrepositories
• each containing more specific data, which are
being interlinked with the common body of
knowledge
http://www.ontotext.com/owlimhttp://www.ontotext.com/owlim