Post on 15-Jan-2015
description
Ontopia Code Camp
TMRA 2009-11-11Lars Marius Garshol & Geir Ove Grønmo
Agenda
• About you– who are you?– what do you want from the code camp?
• About Ontopia• The product• The future• Participating in the project• Writing some code!
Some background
About Ontopia
Brief history
• 1999-2000– private hobby project for Geir Ove
• 2000-2009– commercial software sold by Ontopia AS– lots of international customers in diverse
fields
• 2009-– open source project
The project
• Open source hosted at Google Code• Contributors– Lars Marius Garshol, Bouvet– Geir Ove Grønmo, Bouvet– Thomas Neidhart, SpaceApps– Lars Heuer, Semagia– Hannes Niederhausen, TMLab– Stig Lau, Bouvet– Baard H. Rehn-Johansen, Bouvet– Peter-Paul Kruijssen, Morpheus– Quintin Siebers, Morpheus
Current activity (toward 5.1)
• tolog updates– added by LMG
• Various fixes and optimizations– by everyone
• Toma implementation (in sandbox)– by Thomas
• TMQL implementation (in sandbox)?– by Sven Krosse
Architecture and modules
The product
The big picture
Engine
Ontopoly
Portlet support
CMSintegration
Data integration
OKP
Escenic
A.N.other
A.N.other
OtherCMSs
DB2TM
XML2TM
A.N.other
A.N.other
Auto-class.
Taxon.import
Webservice
The engine
• Core API• TMAPI 2.0 support• Import/export• RDF conversion• TMSync• Fulltext search• Event API• tolog query
language• tolog update
language
Engine
Query Engine
• Implementation of Ontopia’s tolog language (based on Prolog and SQL)– Allows powerful queries on the topic map data structure– Simplifies application development and improves performance
• Example:select $B, count($A) from instance-of($B, city),{ premiere($A : opera, $B : place) | premiere($A : opera, $C : place), located-in($C : containee, $B : container) } order by $A desc?
• returns all B's and the corresponding number of A's whereB is a city ANDEITHER
B is the place where A was premieredOR
the place where A was premiered is located in B in decreasing order of A
TMSync
• Configurable module for synchronizing one TM against another– define subset of source TM to sync (using
tolog)– define subset of target TM to sync (using
tolog)– the module handles the rest
• Can also be used with non-TM sources– create a non-updating conversion from the
source to some TM format– then use TMSync to sync against the
converted TM instead of directly against the source
How TMSync works
• Define which part of the target topic map you want,• Define which part of the source topic map it is the
master for, and• The algorithm does the rest
If the source is not a topic map
• Simply do a normal one-time conversion– let TMSync do the update for you
• In other words, TMSync reduces the update problem to a conversion problem
source.xmlconvert.xslt TMSync
The City of Bergen usecase
LOS
Service
Unit Person
City of Bergen
LOS
Norge.no
The backends
• In-memory– no persistent storage– thread-safe– no setup
• RDBMS– transactions– persistent– thread-safe– uses caching– clustering
• Remote– uses web service– read-only– unofficial
Engine
Memory RDBMS Remote
RDBMS Backend
• Allows the Engine to use topic maps stored in a relational database– Based on a generic topic map schema– Necessary when working with very large topic maps– Transparent to applications
• Features– Automatically loads data when needed– Caches frequently used data– Full support for RDBMS transactions– Supports tolog-to-SQL compilation– Statistical reports for performance tuning
• Platform support– Oracle, MySQL, PostgreSQL, MS SQL Server– Test suite available for verifying compatibility with other JDBC-
enabled RDBMSes
DB2TM
• Upconversion to TMs– from RDBMS via
JDBC– or from CSV
• Uses XML mapping– can call out to Java
• Supports sync– either full rescan– or change table
Engine
Memory RDBMS Remote
DB2TM TMRAP Nav Classify
DB2TM example
ID Name Website
1 Ontopia http://www.ontopia.net
2 United Nations http://www.un.org
3 Bouvet http://www.bouvet.no
<relation name="organizations.csv" columns="id name url"> <topic type="ex:organization">
<item-identifier>#org${id}</item-identifier>
<topic-name>${name}</topic-name>
<occurrence type="ex:homepage">${url}</occurrence>
</topic></relation>
+ =
Ontopia
United Nations
Bouvet
TMRAP
• Web service interface– via SOAP– via plain HTTP
• Requests– get-topic– get-topic-page– get-tolog– delete-topic– ...
Engine
Memory RDBMS Remote
DB2TM TMRAP Nav Classify
Navigator framework
• Servlet-based API– manage topic maps– load/scan/delete/
create
• JSP tag library– XSLT-like– based on tolog– JSTL integration
Engine
Memory RDBMS Remote
DB2TM TMRAP Nav Classify
Ontopia Navigator Framework
• Java API for interacting with TM repository
• JSP tag library– based on tolog– kind of like XSLT in JSP with tolog instead of
XPath– has JSTL integration
• Undocumented parts– web presentation components– some wrapped as JSP tags– want to build proper portlets from them
Navigator tag library example
<%-- assume variable 'composer' is already set --%>
<p><b>Operas:</b><br/><tolog:foreach query=”composed-by(%composer% : composer, $OPERA : opera), { premiere-date($OPERA, $DATE) }?”> <li> <a href="opera.jsp?id=<tolog:id var="OPERA"/>”
><tolog:out var="OPERA"/></a>
<tolog:if var="DATE"> <tolog:out var="DATE"/> </tolog:if> </li></tolog:foreach></p>
Elmer Preview
Automated classification
• Undocumented– experimental
• Extracts text– autodetects format– Word, PDF, XML, HTML
• Processes text– detects language– stemming, stop-words
• Extracts keywords– ranked by importance– uses existing topics– supports compound
terms
Engine
Memory RDBMS Remote
DB2TM TMRAP Nav Classify
Example of keyword extraction
• topic maps1.0
• metadata0.57
• subject-based class.0.42• Core metadata
0.42• faceted classification
0.34• taxonomy
0.22• monolingual thesauri
0.19• controlled vocabulary
0.19• Dublin Core
0.16• thesauri
0.16• Dublin
0.15• keywords
0.15
Example #2
• Automated classification 1.05
• Topic Maps0.51 14
• XSLT0.38 11
• compound keywords 0.292
• keywords 0.2620
• Lars0.23 1
• Marius0.23 1
• Garshol 0.221
• ...
So how could this be used?
• To help users classify new documents in a CMS interface– suggest appropriate keywords, screened by user before
approval
• Automate classification of incoming documents– this means lower quality, but also lower cost
• Get an overview of interesting terms in a document corpus– classify all documents, extract the most interesting
terms– this can be used as the starting point for building an
ontology– (keyword extraction only)
Example user interface
• The user creates an article– this screen then used to add keywords– user adjusts the proposals from the classifier
Vizigator
• Graphical visualization
• VizDesktop– Swing app to
configure– filter/style/...
• Vizlet– Java applet for web– uses configuration– loads via TMRAP– uses “Remote”
backend
Engine
Memory RDBMS Remote
DB2TM TMRAP Nav Classify
Viz Ontopoly
The Vizigator
• Graphical visualization of Topic Maps• Two parts– VizDesktop: Swing desktop app for
configuration– Vizlet: Java applet for web deployment
• Configuration stored in XTM file
Without configuration
With configuration
The Vizigator
• The Vizigator uses TMRAP– the Vizlet runs in the browser (on the client)– a fragment of the topic map is downloaded from the server– the fragment is grown as needed
ServerTMRAP
Ontopoly
• Generic editor– web-based, AJAX– meta-ontology in TM
• Ontology designer– create types and
fields– control user interface– build views– incremental dev
• Instance editor– guided by ontology
Engine
Memory RDBMS Remote
DB2TM TMRAP Nav Classify
Viz Ontopoly
Ontopoly
• A generic Topic Maps editor, in two parts– ontology editor: used to create the ontology and
schema– instance editor: used to enter instances based on
ontology
• Built with the Web Editor Framework– works with both XTM files and topic maps stored in
RDBMS backend– supports access control to administrative functions,
ontology, and instance editors– existing topic maps can be imported– parts of the ontology can be marked as read-only, or
hidden
Typical deployment
Application server
EngineDB
Backend
DB
DB
DB2TM
Framew
orks
UsersViewingapplication
Editors
Ontopoly
HTTP
TMRAP
External application
CMS integration
• The best way to add content functionality to Ontopia– the world doesn’t need another CMS– better to reuse those which already exist
• So far two integrations exist– Escenic– OfficeNet Knowledge Portal– more are being worked on
Implementation
• A CMS event listener– the listener creates topics for new CMS articles, folders, etc– the mapping is basically the design of the ontology used by this listener
• Presentation integration– it must be possible to list all topics attached to an article– conversely, it must be possible to list all articles attached to a topic– how close the integration needs to be here will vary, as will the difficulty
of the integration
• User interface integration– it needs to be possible to attach topics to an article from within the
normal CMS user interface– this can be quite tricky
• Search integration– the Topic Maps search needs to also search content in the CMS– can be achieved by writing a tolog plug-in
Articles as topics
• Goal: associate articles with topics– mainly to say what they are about– typically also want to include other metadata
• Need to create topics for the articles to do this– in fact, a general CMS-to-TM mapping is needed– must decide what metadata and structures to include
New city council appointed
is about
Elections
Mapping issues
• Article topics– what topic type to use?– title becomes name? (do you know the title?)– include author? include last modified? include workflow
state?– should all articles be mapped?
• Folders/directories/sections/...– should these be mapped, too?– one topic type for all folders/.../.../...?– if so, use associations to connect articles to folders– use associations to reproduce hierarchical folder structure
• Multimedia objects– should these be included?– what topic type? what name? ...
Two styles of mappings
Articles as articles• Topic represents only the article• Topic type is some subclass of “article”• “Is about” association connects article into topic map• Fields are presentational
– title, abstract, body
Articles as concepts• Topic represents some real-world subject (like a person)
– article is just the default content about that subject
• Type is the type of the subject (person)• Semantic associations to the rest of the topic map
– works in department, has competence, ...
• Fields can be semantic– name, phone no, email, ...
Article as article
• Article about building of a new school
• Is about association to “Primary schools”
• Topic type is “article”
Article as concept
Article about a sports hall
Article really represents the hall
Topic type is “Location”
Associations to– city borough– events in the location– category “Sports”
Two projects
The project
• A new citizen’s portal for the city administration– strategic decision to make portal main interface for
interaction with citizens– as many services as possible are to be moved online
• Big project– started in late 2004, to continue at
least into 2008– ~5 million Euro spent by launch date– 1.7 million Euro budgeted for 2007– Topic Maps development is a fraction
of this (less than 25%)• Many companies involved
– Bouvet/Ontopia– Avenir– KPMG– Karabin– Escenic
Simplified original ontology
Externalresource
Category
Subject Department
Service
Employee
Borough
FormArticle
nearlyeverything
LOSService catalog
Payroll++
Escenic (CMS)
Data flow
OntopiaEscenic LOS
Fellesdata
Payroll(Agresso)Dexter/Extens Service
Catalog
DB2TM
TMSync
Ontopoly
Integration
Conceptual architecture
Ontopia Escenic
Application
Oracle Portal
Oracle Database
Datasources
The portal
Technical architecture
NRK/Skole
• Norwegian National Broadcasting (NRK)– media resources from the archives– published for use in schools– integrated with the National Curriculum
• In production– delayed by copyright wrangling
• Technologies– OKS– Polopoly CMS– MySQL database– Resin application server
Curriculum-based browsing (1)
Curriculum
Social studies
High school
Curriculum-based browsing (2)
Gender roles
Curriculum-based browsing (3)
Feminist movement in the 70s and 80sChanges to the family in the 70sThe prime minister’s husbandChildren choosing careersGay partnerships in 1993
One video (prime minister’s husband)
Metadata
Description
Subject
Person
Relatedresources
Conceptual architecture
Polopoly
Ontopia
MySQL
MediaDBGrep
RDBMS backend
DB2TMTMSync
HTTP
Editors
Implementation
• Domain model in Java– Plain old Java objects built on
• Ontopia’s Java API• tolog
• JSP for presentation– using JSTL on top of the domain model
• Subversion for the source code• Maven2 to build and deploy• Unit tests
What we’d like to see
The future
The big picture
Engine
Ontopoly
Portlet support
CMSintegration
Data integration
OKP
Escenic
A.N.other
A.N.other
OtherCMSs
DB2TM
XML2TM
A.N.other
A.N.other
Auto-class.
Taxon.import
Webservice
CMS integrations
• The more of these, the better• Candidate CMSs– Liferay (being worked on at Bouvet)– Alfresco (might be started soon)– Magnolia– Inspera (possible project here)– JSR-170 Java Content Repository– CMIS (OASIS web service standard)
Portlet toolkit
• Subversion contains a number of “portlets”– basically, Java objects doing presentation tasks– some have JSP wrappers as well
• Examples– display tree view– list of topics filterable by facets– show related topics– get-topic-page via TMRAP component
• Not ready for prime-time yet– undocumented– incomplete
Ontopoly plug-ins
• Plugins for getting more data from externals– TMSync import plugin– DB2TM plugin– Subj3ct.com plugin– adapted RDF2TM plugin– classify plugin– ...
• Plugins for ontology fragments– menu editor, for example
TMCL
• Now implementable• We’d like to see– an object model for TMCL (supporting
changes)– a validator based on the object model– Ontopoly import/export from TMCL (initially)– refactor Ontopoly API to make it more
portable– Ontopoly ported to use TMCL natively
(eventually)
Things we’d like to remove
• OSL support– Ontopia Schema Language
• Web editor framework– unfortunately, still used by some major
customers
• Fulltext search– the old APIs for this are not really of any use
Management interface
• Import topic maps (to file or RDBMS)
What do you think?
• Suggestions?• Questions?• Plans?• Ideas?
Setting up the developer environment
Getting started
If you are using Ontopia...
• ...simply download the zip, then– unzip,– set the classpath,– start the server, ...
• ...and you’re good to go
If you are developing Ontopia...
• You must have– Java 1.5 (not 1.6 or 1.7 or ...)– Ant 1.6 (or later)– Ivy 2.0 (or later)– Subversion
• Then– check out the source from Subversion
svn checkout http://ontopia.googlecode.com/svn/trunk/ ontopia-read-only
– ant bootstrap– ant dist.jar.ontopia– ant test– ant dist.ontopia
Beware
• This is fun, because– you can play around with anything you want– e.g, my build has a faster
TopicIF.getRolesByType– you can track changes as they happen in svn
• However, you’re on your own– if it fails it’s kind of hard to say why– maybe it’s your changes, maybe not
• For production use, official releases are best
Participating etc
The project
Our goal
• To provide the best toolkit for building Topic Maps-based applications
• We want it to be– actively maintained,– bug-free,– scalable,– easy to use,– well documented,– stable,– reliable
Our philosophy
• We want Ontopia to provide as much useful more-or-less generic functionality as possible
• New contributions are generally welcome as long as– they meet the quality requirements, and– they don’t cause problems for others
The sandbox
• There’s a lot of Ontopia-related code which does not meet those requirements– some of it can be very useful,– someone may pick it up and improve it
• The sandbox is for these pieces– some are in Ontopia’s Subversion repository,– others are maintained externally
• To be “promoted” into Ontopia a module needs– an active maintainer,– to be generally useful, and– to meet certain quality requirements
Communications
• Join the mailing list(s)!– http://groups.google.com/group/ontopia– http://groups.google.com/group/ontopia-dev
• Google Code page– http://code.google.com/p/ontopia/– note the “updates” feed!
• Blog– http://ontopia.wordpress.com
• Twitter– http://twitter.com/ontopia
Committers
• These are the people who run the project– they can actually commit to Subversion– they can vote on decisions to be made etc
• Everyone else can– use the software as much as they want,– report and comment on issues,– discuss on the mailing list, and– submit patches for inclusion
How to become a committer
• Participate in the project!– that is, get involved first– let people get to know you, show some
commitment
• Once you’ve gotten some way into the project you can ask to become a committer– best if you have provided some patches first
• Unless you’re going to commit changes there’s no need to be a committer
Finding a task to work on
• Report bugs!– they exist. if you find any, please report them.
• Look at the open issues– there is always testing/discussion to be done
• Look for issues marked “newbie”– http://code.google.com/p/ontopia/issues/list?
q=label:Newbie
• Look at what’s in the sandbox– most of these modules need work
• Scratch an itch– if there’s something you want
fixed/changed/added...
How to fix a bug
• First figure out why you think it fails• Then write a test case– based on your assumption– make sure the test case fails (test before you
fix)
• Then fix the bug– follow the coding guidelines (see wiki)
• Then run the test suite– verify that you’ve fixed the bug– verify that you haven’t broken anything
• Then submit the patch
The test suite
• Lots of *.test packages in the source tree– 3148 test cases as of right now– test data in ontopia/src/test-data– some tests are generators based on files– some of the test files come from cxtm-
tests.sf.net
• Run with– ant test– java net.ontopia.test.TestRunner src/test-
data/config/tests.xml test-group
Source tree structure
• net.ontopia.– utils various utilities– test various test
support code– infoset LocatorIF code +
cruft– persistence OR-mapper for RDBMS
backend– product cruft– xml various XML-
related utilities– topicmaps next slides
Source tree structure
• net.ontopia.topicmaps.– core core engine API– impl engine backends + utils– utils utilities (see next slide)– cmdlineutils command-line tools– entry TM repository– nav + nav2navigator framework– query tolog engine– viz– classify– db2tm– webed cruft
Source tree structure
• net.ontopia.topicmaps.utils– * various utility classes– ltm LTM reader and writer– ctm CTM reader– rdf RDF converter (both ways)– tmrap TMRAP implementation
Let’s write some code!
The engine
• The core API corresponds closely to the TMDM– TopicMapIF, TopicIF, TopicNameIF, ...
• Compile with– ant init compile.ontopia– .class files go into ontopia/build/classes– ant dist.ontopia.jar # makes a jar
The importers
• Main class implements TopicMapReaderIF– usually, this lets you set up configuration, etc– then uses other classes to do the real work
• XTM importers– use an XML parser– main work done in XTM(2)ContentHandler– some extra code for validation and format detection
• CTM/LTM importers– use Antlr-based parsers– real code in ctm.g/ltm.g
• All importers work via the core API
Fixing a real bug
• There is a failing test case in the TM/XML importer
• So let’s fix that right now...
Find an issue in the issue tracker
• (Picking one with “Newbie” might be good, – but isn’t necessary)
• Get set up– check out the source code– build the code– run the test suite
• Then dig in– we’ll help you with any questions you have
• At the end, submit a patch to the issue tracker– remember to use the test suite!