Ontopia Code Camp

98
Ontopia Code Camp TMRA 2009-11-11 Lars Marius Garshol & Geir Ove Grønmo

description

A presentation of the Ontopia product from the Ontopia Code Camp at TMRA 2009.

Transcript of Ontopia Code Camp

Page 1: Ontopia Code Camp

Ontopia Code Camp

TMRA 2009-11-11Lars Marius Garshol & Geir Ove Grønmo

Page 2: Ontopia Code Camp

Agenda

• About you– who are you?– what do you want from the code camp?

• About Ontopia• The product• The future• Participating in the project• Writing some code!

Page 3: Ontopia Code Camp

Some background

About Ontopia

Page 4: Ontopia Code Camp

Brief history

• 1999-2000– private hobby project for Geir Ove

• 2000-2009– commercial software sold by Ontopia AS– lots of international customers in diverse

fields

• 2009-– open source project

Page 5: Ontopia Code Camp

The project

• Open source hosted at Google Code• Contributors– Lars Marius Garshol, Bouvet– Geir Ove Grønmo, Bouvet– Thomas Neidhart, SpaceApps– Lars Heuer, Semagia– Hannes Niederhausen, TMLab– Stig Lau, Bouvet– Baard H. Rehn-Johansen, Bouvet– Peter-Paul Kruijssen, Morpheus– Quintin Siebers, Morpheus

Page 6: Ontopia Code Camp

Current activity (toward 5.1)

• tolog updates– added by LMG

• Various fixes and optimizations– by everyone

• Toma implementation (in sandbox)– by Thomas

• TMQL implementation (in sandbox)?– by Sven Krosse

Page 7: Ontopia Code Camp

Architecture and modules

The product

Page 8: Ontopia Code Camp

The big picture

Engine

Ontopoly

Portlet support

CMSintegration

Data integration

OKP

Escenic

A.N.other

A.N.other

OtherCMSs

DB2TM

XML2TM

A.N.other

A.N.other

Auto-class.

Taxon.import

Webservice

Page 9: Ontopia Code Camp

The engine

• Core API• TMAPI 2.0 support• Import/export• RDF conversion• TMSync• Fulltext search• Event API• tolog query

language• tolog update

language

Engine

Page 10: Ontopia Code Camp

Query Engine

• Implementation of Ontopia’s tolog language (based on Prolog and SQL)– Allows powerful queries on the topic map data structure– Simplifies application development and improves performance

• Example:select $B, count($A) from instance-of($B, city),{ premiere($A : opera, $B : place) | premiere($A : opera, $C : place), located-in($C : containee, $B : container) } order by $A desc?

• returns all B's and the corresponding number of A's whereB is a city ANDEITHER

B is the place where A was premieredOR

the place where A was premiered is located in B in decreasing order of A

Page 11: Ontopia Code Camp

TMSync

• Configurable module for synchronizing one TM against another– define subset of source TM to sync (using

tolog)– define subset of target TM to sync (using

tolog)– the module handles the rest

• Can also be used with non-TM sources– create a non-updating conversion from the

source to some TM format– then use TMSync to sync against the

converted TM instead of directly against the source

Page 12: Ontopia Code Camp

How TMSync works

• Define which part of the target topic map you want,• Define which part of the source topic map it is the

master for, and• The algorithm does the rest

Page 13: Ontopia Code Camp

If the source is not a topic map

• Simply do a normal one-time conversion– let TMSync do the update for you

• In other words, TMSync reduces the update problem to a conversion problem

source.xmlconvert.xslt TMSync

Page 14: Ontopia Code Camp

The City of Bergen usecase

LOS

Service

Unit Person

City of Bergen

LOS

Norge.no

Page 15: Ontopia Code Camp

The backends

• In-memory– no persistent storage– thread-safe– no setup

• RDBMS– transactions– persistent– thread-safe– uses caching– clustering

• Remote– uses web service– read-only– unofficial

Engine

Memory RDBMS Remote

Page 16: Ontopia Code Camp

RDBMS Backend

• Allows the Engine to use topic maps stored in a relational database– Based on a generic topic map schema– Necessary when working with very large topic maps– Transparent to applications

• Features– Automatically loads data when needed– Caches frequently used data– Full support for RDBMS transactions– Supports tolog-to-SQL compilation– Statistical reports for performance tuning

• Platform support– Oracle, MySQL, PostgreSQL, MS SQL Server– Test suite available for verifying compatibility with other JDBC-

enabled RDBMSes

Page 17: Ontopia Code Camp

DB2TM

• Upconversion to TMs– from RDBMS via

JDBC– or from CSV

• Uses XML mapping– can call out to Java

• Supports sync– either full rescan– or change table

Engine

Memory RDBMS Remote

DB2TM TMRAP Nav Classify

Page 18: Ontopia Code Camp

DB2TM example

ID Name Website

1 Ontopia http://www.ontopia.net

2 United Nations http://www.un.org

3 Bouvet http://www.bouvet.no

<relation name="organizations.csv" columns="id name url"> <topic type="ex:organization">

<item-identifier>#org${id}</item-identifier>

<topic-name>${name}</topic-name>

<occurrence type="ex:homepage">${url}</occurrence>

</topic></relation>

+ =

Ontopia

United Nations

Bouvet

Page 19: Ontopia Code Camp

TMRAP

• Web service interface– via SOAP– via plain HTTP

• Requests– get-topic– get-topic-page– get-tolog– delete-topic– ...

Engine

Memory RDBMS Remote

DB2TM TMRAP Nav Classify

Page 20: Ontopia Code Camp

Navigator framework

• Servlet-based API– manage topic maps– load/scan/delete/

create

• JSP tag library– XSLT-like– based on tolog– JSTL integration

Engine

Memory RDBMS Remote

DB2TM TMRAP Nav Classify

Page 21: Ontopia Code Camp

Ontopia Navigator Framework

• Java API for interacting with TM repository

• JSP tag library– based on tolog– kind of like XSLT in JSP with tolog instead of

XPath– has JSTL integration

• Undocumented parts– web presentation components– some wrapped as JSP tags– want to build proper portlets from them

Page 22: Ontopia Code Camp

http://www.ontopia.net/operamap

Page 23: Ontopia Code Camp

Navigator tag library example

<%-- assume variable 'composer' is already set --%>

<p><b>Operas:</b><br/><tolog:foreach query=”composed-by(%composer% : composer, $OPERA : opera), { premiere-date($OPERA, $DATE) }?”> <li> <a href="opera.jsp?id=<tolog:id var="OPERA"/>”

><tolog:out var="OPERA"/></a>

<tolog:if var="DATE"> <tolog:out var="DATE"/> </tolog:if> </li></tolog:foreach></p>

Page 24: Ontopia Code Camp

Elmer Preview

Page 25: Ontopia Code Camp
Page 27: Ontopia Code Camp
Page 28: Ontopia Code Camp

Automated classification

• Undocumented– experimental

• Extracts text– autodetects format– Word, PDF, XML, HTML

• Processes text– detects language– stemming, stop-words

• Extracts keywords– ranked by importance– uses existing topics– supports compound

terms

Engine

Memory RDBMS Remote

DB2TM TMRAP Nav Classify

Page 29: Ontopia Code Camp

Example of keyword extraction

• topic maps1.0

• metadata0.57

• subject-based class.0.42• Core metadata

0.42• faceted classification

0.34• taxonomy

0.22• monolingual thesauri

0.19• controlled vocabulary

0.19• Dublin Core

0.16• thesauri

0.16• Dublin

0.15• keywords

0.15

Page 30: Ontopia Code Camp

Example #2

• Automated classification 1.05

• Topic Maps0.51 14

• XSLT0.38 11

• compound keywords 0.292

• keywords 0.2620

• Lars0.23 1

• Marius0.23 1

• Garshol 0.221

• ...

Page 31: Ontopia Code Camp

So how could this be used?

• To help users classify new documents in a CMS interface– suggest appropriate keywords, screened by user before

approval

• Automate classification of incoming documents– this means lower quality, but also lower cost

• Get an overview of interesting terms in a document corpus– classify all documents, extract the most interesting

terms– this can be used as the starting point for building an

ontology– (keyword extraction only)

Page 32: Ontopia Code Camp

Example user interface

• The user creates an article– this screen then used to add keywords– user adjusts the proposals from the classifier

Page 33: Ontopia Code Camp

Vizigator

• Graphical visualization

• VizDesktop– Swing app to

configure– filter/style/...

• Vizlet– Java applet for web– uses configuration– loads via TMRAP– uses “Remote”

backend

Engine

Memory RDBMS Remote

DB2TM TMRAP Nav Classify

Viz Ontopoly

Page 34: Ontopia Code Camp

The Vizigator

• Graphical visualization of Topic Maps• Two parts– VizDesktop: Swing desktop app for

configuration– Vizlet: Java applet for web deployment

• Configuration stored in XTM file

Page 35: Ontopia Code Camp

Without configuration

Page 36: Ontopia Code Camp

With configuration

Page 37: Ontopia Code Camp

The Vizigator

• The Vizigator uses TMRAP– the Vizlet runs in the browser (on the client)– a fragment of the topic map is downloaded from the server– the fragment is grown as needed

ServerTMRAP

Page 38: Ontopia Code Camp

Ontopoly

• Generic editor– web-based, AJAX– meta-ontology in TM

• Ontology designer– create types and

fields– control user interface– build views– incremental dev

• Instance editor– guided by ontology

Engine

Memory RDBMS Remote

DB2TM TMRAP Nav Classify

Viz Ontopoly

Page 39: Ontopia Code Camp

Ontopoly

• A generic Topic Maps editor, in two parts– ontology editor: used to create the ontology and

schema– instance editor: used to enter instances based on

ontology

• Built with the Web Editor Framework– works with both XTM files and topic maps stored in

RDBMS backend– supports access control to administrative functions,

ontology, and instance editors– existing topic maps can be imported– parts of the ontology can be marked as read-only, or

hidden

Page 40: Ontopia Code Camp
Page 41: Ontopia Code Camp

Typical deployment

Application server

EngineDB

Backend

DB

DB

DB2TM

Framew

orks

UsersViewingapplication

Editors

Ontopoly

HTTP

TMRAP

External application

Page 42: Ontopia Code Camp

CMS integration

• The best way to add content functionality to Ontopia– the world doesn’t need another CMS– better to reuse those which already exist

• So far two integrations exist– Escenic– OfficeNet Knowledge Portal– more are being worked on

Page 43: Ontopia Code Camp

Implementation

• A CMS event listener– the listener creates topics for new CMS articles, folders, etc– the mapping is basically the design of the ontology used by this listener

• Presentation integration– it must be possible to list all topics attached to an article– conversely, it must be possible to list all articles attached to a topic– how close the integration needs to be here will vary, as will the difficulty

of the integration

• User interface integration– it needs to be possible to attach topics to an article from within the

normal CMS user interface– this can be quite tricky

• Search integration– the Topic Maps search needs to also search content in the CMS– can be achieved by writing a tolog plug-in

Page 44: Ontopia Code Camp

Articles as topics

• Goal: associate articles with topics– mainly to say what they are about– typically also want to include other metadata

• Need to create topics for the articles to do this– in fact, a general CMS-to-TM mapping is needed– must decide what metadata and structures to include

New city council appointed

is about

Elections

Page 45: Ontopia Code Camp

Mapping issues

• Article topics– what topic type to use?– title becomes name? (do you know the title?)– include author? include last modified? include workflow

state?– should all articles be mapped?

• Folders/directories/sections/...– should these be mapped, too?– one topic type for all folders/.../.../...?– if so, use associations to connect articles to folders– use associations to reproduce hierarchical folder structure

• Multimedia objects– should these be included?– what topic type? what name? ...

Page 46: Ontopia Code Camp

Two styles of mappings

Articles as articles• Topic represents only the article• Topic type is some subclass of “article”• “Is about” association connects article into topic map• Fields are presentational

– title, abstract, body

Articles as concepts• Topic represents some real-world subject (like a person)

– article is just the default content about that subject

• Type is the type of the subject (person)• Semantic associations to the rest of the topic map

– works in department, has competence, ...

• Fields can be semantic– name, phone no, email, ...

Page 47: Ontopia Code Camp

Article as article

• Article about building of a new school

• Is about association to “Primary schools”

• Topic type is “article”

Page 48: Ontopia Code Camp

Article as concept

Article about a sports hall

Article really represents the hall

Topic type is “Location”

Associations to– city borough– events in the location– category “Sports”

Page 49: Ontopia Code Camp
Page 50: Ontopia Code Camp
Page 51: Ontopia Code Camp
Page 52: Ontopia Code Camp
Page 53: Ontopia Code Camp
Page 54: Ontopia Code Camp

Two projects

Page 55: Ontopia Code Camp

The project

• A new citizen’s portal for the city administration– strategic decision to make portal main interface for

interaction with citizens– as many services as possible are to be moved online

• Big project– started in late 2004, to continue at

least into 2008– ~5 million Euro spent by launch date– 1.7 million Euro budgeted for 2007– Topic Maps development is a fraction

of this (less than 25%)• Many companies involved

– Bouvet/Ontopia– Avenir– KPMG– Karabin– Escenic

Page 56: Ontopia Code Camp

Simplified original ontology

Externalresource

Category

Subject Department

Service

Employee

Borough

FormArticle

nearlyeverything

LOSService catalog

Payroll++

Escenic (CMS)

Page 57: Ontopia Code Camp

Data flow

OntopiaEscenic LOS

Fellesdata

Payroll(Agresso)Dexter/Extens Service

Catalog

DB2TM

TMSync

Ontopoly

Integration

Page 58: Ontopia Code Camp

Conceptual architecture

Ontopia Escenic

Application

Oracle Portal

Oracle Database

Datasources

Page 59: Ontopia Code Camp

The portal

Page 60: Ontopia Code Camp

Technical architecture

Page 61: Ontopia Code Camp

NRK/Skole

• Norwegian National Broadcasting (NRK)– media resources from the archives– published for use in schools– integrated with the National Curriculum

• In production– delayed by copyright wrangling

• Technologies– OKS– Polopoly CMS– MySQL database– Resin application server

Page 62: Ontopia Code Camp

Curriculum-based browsing (1)

Curriculum

Social studies

High school

Page 63: Ontopia Code Camp

Curriculum-based browsing (2)

Gender roles

Page 64: Ontopia Code Camp

Curriculum-based browsing (3)

Feminist movement in the 70s and 80sChanges to the family in the 70sThe prime minister’s husbandChildren choosing careersGay partnerships in 1993

Page 65: Ontopia Code Camp

One video (prime minister’s husband)

Metadata

Description

Subject

Person

Relatedresources

Page 66: Ontopia Code Camp

Conceptual architecture

Polopoly

Ontopia

MySQL

MediaDBGrep

RDBMS backend

DB2TMTMSync

HTTP

Editors

Page 67: Ontopia Code Camp

Implementation

• Domain model in Java– Plain old Java objects built on

• Ontopia’s Java API• tolog

• JSP for presentation– using JSTL on top of the domain model

• Subversion for the source code• Maven2 to build and deploy• Unit tests

Page 68: Ontopia Code Camp

What we’d like to see

The future

Page 69: Ontopia Code Camp

The big picture

Engine

Ontopoly

Portlet support

CMSintegration

Data integration

OKP

Escenic

A.N.other

A.N.other

OtherCMSs

DB2TM

XML2TM

A.N.other

A.N.other

Auto-class.

Taxon.import

Webservice

Page 70: Ontopia Code Camp

CMS integrations

• The more of these, the better• Candidate CMSs– Liferay (being worked on at Bouvet)– Alfresco (might be started soon)– Magnolia– Inspera (possible project here)– JSR-170 Java Content Repository– CMIS (OASIS web service standard)

Page 71: Ontopia Code Camp

Portlet toolkit

• Subversion contains a number of “portlets”– basically, Java objects doing presentation tasks– some have JSP wrappers as well

• Examples– display tree view– list of topics filterable by facets– show related topics– get-topic-page via TMRAP component

• Not ready for prime-time yet– undocumented– incomplete

Page 72: Ontopia Code Camp

Ontopoly plug-ins

• Plugins for getting more data from externals– TMSync import plugin– DB2TM plugin– Subj3ct.com plugin– adapted RDF2TM plugin– classify plugin– ...

• Plugins for ontology fragments– menu editor, for example

Page 73: Ontopia Code Camp

TMCL

• Now implementable• We’d like to see– an object model for TMCL (supporting

changes)– a validator based on the object model– Ontopoly import/export from TMCL (initially)– refactor Ontopoly API to make it more

portable– Ontopoly ported to use TMCL natively

(eventually)

Page 74: Ontopia Code Camp

Things we’d like to remove

• OSL support– Ontopia Schema Language

• Web editor framework– unfortunately, still used by some major

customers

• Fulltext search– the old APIs for this are not really of any use

Page 75: Ontopia Code Camp

Management interface

• Import topic maps (to file or RDBMS)

Page 76: Ontopia Code Camp

What do you think?

• Suggestions?• Questions?• Plans?• Ideas?

Page 77: Ontopia Code Camp

Setting up the developer environment

Getting started

Page 78: Ontopia Code Camp

If you are using Ontopia...

• ...simply download the zip, then– unzip,– set the classpath,– start the server, ...

• ...and you’re good to go

Page 79: Ontopia Code Camp

If you are developing Ontopia...

• You must have– Java 1.5 (not 1.6 or 1.7 or ...)– Ant 1.6 (or later)– Ivy 2.0 (or later)– Subversion

• Then– check out the source from Subversion

svn checkout http://ontopia.googlecode.com/svn/trunk/ ontopia-read-only

– ant bootstrap– ant dist.jar.ontopia– ant test– ant dist.ontopia

Page 80: Ontopia Code Camp

Beware

• This is fun, because– you can play around with anything you want– e.g, my build has a faster

TopicIF.getRolesByType– you can track changes as they happen in svn

• However, you’re on your own– if it fails it’s kind of hard to say why– maybe it’s your changes, maybe not

• For production use, official releases are best

Page 81: Ontopia Code Camp

Participating etc

The project

Page 82: Ontopia Code Camp

Our goal

• To provide the best toolkit for building Topic Maps-based applications

• We want it to be– actively maintained,– bug-free,– scalable,– easy to use,– well documented,– stable,– reliable

Page 83: Ontopia Code Camp

Our philosophy

• We want Ontopia to provide as much useful more-or-less generic functionality as possible

• New contributions are generally welcome as long as– they meet the quality requirements, and– they don’t cause problems for others

Page 84: Ontopia Code Camp

The sandbox

• There’s a lot of Ontopia-related code which does not meet those requirements– some of it can be very useful,– someone may pick it up and improve it

• The sandbox is for these pieces– some are in Ontopia’s Subversion repository,– others are maintained externally

• To be “promoted” into Ontopia a module needs– an active maintainer,– to be generally useful, and– to meet certain quality requirements

Page 85: Ontopia Code Camp

Communications

• Join the mailing list(s)!– http://groups.google.com/group/ontopia– http://groups.google.com/group/ontopia-dev

• Google Code page– http://code.google.com/p/ontopia/– note the “updates” feed!

• Blog– http://ontopia.wordpress.com

• Twitter– http://twitter.com/ontopia

Page 86: Ontopia Code Camp

Committers

• These are the people who run the project– they can actually commit to Subversion– they can vote on decisions to be made etc

• Everyone else can– use the software as much as they want,– report and comment on issues,– discuss on the mailing list, and– submit patches for inclusion

Page 87: Ontopia Code Camp

How to become a committer

• Participate in the project!– that is, get involved first– let people get to know you, show some

commitment

• Once you’ve gotten some way into the project you can ask to become a committer– best if you have provided some patches first

• Unless you’re going to commit changes there’s no need to be a committer

Page 88: Ontopia Code Camp

Finding a task to work on

• Report bugs!– they exist. if you find any, please report them.

• Look at the open issues– there is always testing/discussion to be done

• Look for issues marked “newbie”– http://code.google.com/p/ontopia/issues/list?

q=label:Newbie

• Look at what’s in the sandbox– most of these modules need work

• Scratch an itch– if there’s something you want

fixed/changed/added...

Page 89: Ontopia Code Camp

How to fix a bug

• First figure out why you think it fails• Then write a test case– based on your assumption– make sure the test case fails (test before you

fix)

• Then fix the bug– follow the coding guidelines (see wiki)

• Then run the test suite– verify that you’ve fixed the bug– verify that you haven’t broken anything

• Then submit the patch

Page 90: Ontopia Code Camp

The test suite

• Lots of *.test packages in the source tree– 3148 test cases as of right now– test data in ontopia/src/test-data– some tests are generators based on files– some of the test files come from cxtm-

tests.sf.net

• Run with– ant test– java net.ontopia.test.TestRunner src/test-

data/config/tests.xml test-group

Page 91: Ontopia Code Camp

Source tree structure

• net.ontopia.– utils various utilities– test various test

support code– infoset LocatorIF code +

cruft– persistence OR-mapper for RDBMS

backend– product cruft– xml various XML-

related utilities– topicmaps next slides

Page 92: Ontopia Code Camp

Source tree structure

• net.ontopia.topicmaps.– core core engine API– impl engine backends + utils– utils utilities (see next slide)– cmdlineutils command-line tools– entry TM repository– nav + nav2navigator framework– query tolog engine– viz– classify– db2tm– webed cruft

Page 93: Ontopia Code Camp

Source tree structure

• net.ontopia.topicmaps.utils– * various utility classes– ltm LTM reader and writer– ctm CTM reader– rdf RDF converter (both ways)– tmrap TMRAP implementation

Page 94: Ontopia Code Camp

Let’s write some code!

Page 95: Ontopia Code Camp

The engine

• The core API corresponds closely to the TMDM– TopicMapIF, TopicIF, TopicNameIF, ...

• Compile with– ant init compile.ontopia– .class files go into ontopia/build/classes– ant dist.ontopia.jar # makes a jar

Page 96: Ontopia Code Camp

The importers

• Main class implements TopicMapReaderIF– usually, this lets you set up configuration, etc– then uses other classes to do the real work

• XTM importers– use an XML parser– main work done in XTM(2)ContentHandler– some extra code for validation and format detection

• CTM/LTM importers– use Antlr-based parsers– real code in ctm.g/ltm.g

• All importers work via the core API

Page 97: Ontopia Code Camp

Fixing a real bug

• There is a failing test case in the TM/XML importer

• So let’s fix that right now...

Page 98: Ontopia Code Camp

Find an issue in the issue tracker

• (Picking one with “Newbie” might be good, – but isn’t necessary)

• Get set up– check out the source code– build the code– run the test suite

• Then dig in– we’ll help you with any questions you have

• At the end, submit a patch to the issue tracker– remember to use the test suite!